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This report provides a compilation of maps and spatial assessments of bathymetry, surficial sediments, 
oceanographic habitat variables, deep sea corals, and seabirds for offshore waters of New York. The 
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priority scientific needs, plan the analytical approach, assess existing data, and compile findings into this 
report. The targeted users of this report are coastal managers at OGLP, but other State and federal decision- 
makers, offshore renewable energy development interests and environmental advocates will also find the 
information useful. 



The presented data and maps are the most accurate and up-to-date ecological information available for the 
study area. The diverse ecological themes which are treated here represent priority data gaps and were 
requested by OGLP to better understand and balance ocean uses and environmental conservation. The data 
will feed into a larger project led by OGLP to compile and assess existing data for offshore spatial planning. 

NCCOS is a recognized scientific leader in developing biogeographic assessments. These assessments are 
organized around the development of geospatial data layers for ecological parameters, integrated analyses, 
and specific quantitative products to aid in resource management. The spatial analyses in this report build on 
and advance existing biogeographic techniques developed by NCCOS for other coastal and marine areas, 
including the Gulf of Maine, North and Central California, and the Northwestern Hawaiian Islands. This report, 
along with similar biogeographic products from around the nation, is also available online. For more information 
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Chris Caldow 
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Executive Summary 



This report provides a compilation of new maps 
and spatial assessments for seabirds, bathymetry, 
surficial sediments, deep sea corals, and 
oceanographic habitats in support of offshore spatial 
planning led by the New York Department of State 
Ocean and Great Lakes Program. These diverse 
ecological themes represent priority information 
gaps left by past assessments and were requested 
by New York to better understand and balance 
ocean uses and environmental conservation 
in the Atlantic. The main goal of this report is to 
translate raw ecological, geomorphological and 
oceanographic data into maps and assessments 
that can be easily used and understood by coastal 
managers involved in offshore spatial planning. 

New York plans to integrate information in this report 

with other ecological, geophysical and human use 

data to obtain a broad perspective on the ocean 

environment, human uses and their interactions. 

New York will then use this information in an ecosystem-based framework to coordinate and support decisions 

balancing competing demands in their offshore environment, and ultimately develop a series of amendments 

to New York's federally approved Coastal Management Program. 




Image 0.1. Roseate Tern (Sterna dougallii) in flight. This species is 
endangered in the Mid-Atlantic. Photo credit: David Pereksta, BOEM. 



The targeted users of this report and the compiled spatial information are New York coastal managers, but 
other State and federal decision-makers, offshore renewable energy development interests and environmental 
advocates will also find the information useful. In addition, the data and approaches will be useful to regional 
spatial planning initiatives set up by the Mid-Atlantic Regional Council on the Ocean (MARCO) and federal 
regional planning bodies for coastal and marine spatial planning. 

This report represents a synthesis of existing information rather than a new data collection effort. Given the short 
time frame over which management decisions frequently need to be made and the high cost of new natural 
resource surveys, this approach may be one other coastal zone managers will want to consider. The data 
and maps were developed by employing a spatial analytical approach which applies predictive modeling and 
geostatistics to interpolate among data, identify important spatial patterns and develop continuous distribution 
maps of species and physical resources at fine spatial resolutions required to support spatial management 
decisions. This analytical approach also allows for a quantitative description of prediction certainty a useful 
parameter for spatial planning. For example, maps of prediction reliability can be used to target efforts to 
collect new survey data to fill information gaps, or to incorporate measures of certainty into risk assessments. 

In Chapter 2, a new bathymetric model with spatially-explicit certainty estimates is presented which builds on 
previous predictive bathymetric modeling approaches in the region. The new model provides a continuous 
gridded bathymetric surface for the study area, and allows users to view and explore spatial variation in the 
vertical accuracy of depth predictions. The new model is similar to the National Oceanic and Atmospheric 
Administration's Coastal Relief Model but provides estimates of prediction certainty, which can be used to 
prioritize locations for new bathymetric surveys and better understand the reliability of depth predictions and 
derived spatial layers (e.g., benthic habitats, positions of depth contours). 



Predictive models of mean grain size, sediment composition, and hard bottom occurrence were developed 
to map the distribution of surficial sediments and habitats on the seafloor. These new models presented in 



Chapter 3 build upon the data compilations and analytical frameworks laid out by existing work in the area. For 
mean grain size and sediment composition (e.g., mud, sand, gravel) the models provide continuous, gridded 
spatially-explicit prediction surfaces and corresponding certainty estimates. A hard bottom occurrence model 
also provides a continuous gridded prediction surface representing the likelihood of hard bottom habitats. 

In Chapter 5, the locations of deep sea coral and sponge records within NOAA's Deep Sea Coral Research 
and Technology Program's geodatabase are examined within the New York Bight. Predictive models were not 
developed for deep sea corals or sponges because of limitations on the quantity and type of data available within 
the New York Study area. Instead, we focused on mapping known locations of deep sea corals and sponges, 
discussing their role as important habitat for other marine organisms, and discussing and summarizing past 
studies in the region. 

In Chapter 6, new maps of the seasonal and annual distributions of seabird species and seabird ecological 
groups are provided. These distributions were predicted based on statistical models fit to visual shipboard 
seabird observational data collected as part of a standardized survey program from 1980-1988. Species and 
group distributional models were then combined to produce "hotspot" maps depicting multi-species abundance 
and diversity patterns. Spatial predictors included long-term archival satellite, oceanographic, hydrographic, 
and biological datasets. Seabird distributional maps for seasonal and annual relative indices of occurrence and 
abundance were produced, accompanied by maps depicting metrics of certainty. The resolution of predictions 
is approximately 400 times finer than previous 1 arc-minute maps of seabird distribution in this region. These 
high-resolution, contiguous predictive maps of seabird distribution are expected to be useful contributions to 
offshore spatial planning, particularly of activities that may affect seabirds or their habitats. 





Image 1. 1. Offshore wind farm. 

Photo credit: A. Meskens (Wikimedia Commons) 



Introduction 

Charles Menza 1 , Chris Caldow 1 , Jeff Herter 2 , and Greg Capobianco 2 

1 .1 . A CALL FOR SPATIAL PLANNING OFFSHORE OF NEW YORK 

New York depends on healthy coastal and marine 
ecosystems for its thriving economy and vibrant 
communities. These ecosystems support critical 
habitats for wildlife and a growing number of 
significant and often competing ocean uses and 
activities, such asfishing, commercial transportation, 
recreational boating and energy production. 
Planners, policy makers and resource managers 
are being challenged to sustainably balance ocean 
uses and environmental conservation in a finite 
space and with limited information. Solutions to 
these challenges are complicated by emerging 
industries, climate change, and a growing coastal 
population with shifting needs. 

New York is addressing competition and evolving 
threats to coastal and marine resources and 
services by compiling spatial information and 
applying ecosystem-based management to spatial 
planning. In 2006, the New York Oceans and Great 

Lakes Ecosystem Conservation Act created the New York Oceans and Great Lakes Ecosystem Conservation 
Council and charged the New York Department of State (DOS) with developing amendments to its federally- 
approved Coastal Management Program to better manage human activities that impact coastal and marine 
ecosystems. 

The Coastal Management Program within DOS has broad authority to guide human uses and can use the 
consistency determination process, outlined in the Coastal Zone Management Act of 1972 (Public Law 92- 
583, 16 U.S.C. 1451-1456), to affect decisions made in both federal and state waters. Ultimately amendments 
will be integrated into state and federal permitting processes related to siting ocean uses and regional ocean 
planning programs. A state with an approved Coastal Management Program has the authority to approve or 
deny a proposed federal action if it may affect the state's coastal resources. 

DOS is taking a phased approach for developing amendments by focusing on the most pressing issues first. 
New York's first amendment will apply to the Atlantic waters off New York out to the continental shelf and will 
focus on guiding decisions for new clean, renewable energy production and transmission, while addressing 
conflicts with other human activities and protecting critical habitats. Future amendments will include Long 
Island Sound and the Great Lakes. 

New York has joined a growing number of states and federal agencies thinking about offshore spatial planning. 
For instance, Massachusetts and Rhode Island have recently completed ocean management plans, and New 
Jersey, Oregon and California are in the process of developing plans or collecting information necessary 
for planning purposes. In addition to state-level planning initiatives, multi-state partnerships and the federal 
government are undertaking spatial planning and have adopted a regional approach. The regional approach 
was chosen to allow for the variability of economic, environmental, and social aspects among different areas, 
provide an ecosystem-based perspective, and match existing regional governance structures. 



1 Center for Coastal Monitoring and Assessment, National Centers for Coastal Ocean Science, National Ocean Service, National 
Oceanic and Atmospheric Administration 

2 Ocean and Great Lakes Program, New York Department of State 



In 2009, the governors of New York, New Jersey, Delaware, Maryland and Virginia committed to a comprehensive 
regional approach to address challenges faced in the ocean waters of the Mid-Atlantic, and created the Mid- 
Atlantic Regional Council on the Ocean (MARCO). The council has since developed action teams to protect 
critical habitats, improve water quality, support sustainable development of renewable energy, prepare for 
climate change, and build capacity for effective spatial planning in the region. Many of the data and analytical 
approaches used in this report will likely be useful to the entire mid-Atlantic region. 

The MARCO initiative fits in well with the first ever National Ocean 
Policy signed by President Obama in 2010 (Executive Order 13547, 
2010). The policy seeks to improve stewardship of the oceans, coasts, 
and Great Lakes by way of: adopting ecosystem-based management; 
obtaining, advancing, using, and sharing the best science and data; 
promoting efficiency and collaboration; and strengthening regional 
efforts. The order established the National Ocean Council to guide 
implementation of the policy, and identified nine national priority 
objectives, one of which is to implement coastal and marine spatial 
planning (CMSP). The Council outlined a flexible framework for spatial 
planning that is regional in scope, developed cooperatively among 
federal, state, tribal, and local authorities, and includes substantial 
stakeholder, scientific, and public input (NOC, 2012). 



According to U.S. Executive Order 
13547, CMSP is a "comprehensive, 
adaptive, integrated, ecosystem- 
based, and transparent spatial 
planning process, based on sound 
science, for analyzing current 
and anticipated uses... .[CMSP] 
identifies areas most suitable 
for various types or classes 
of activities in order to reduce 
conflicts among uses, reduce 
environmental impacts, facilitate 
compatible uses, and preserve 
critical ecosystem services to 
meet economic, environmental, 
security, and social objectives. " 



1.2. DATA TO SUPPORT OFFSHORE SPATIAL PLANNING 

New York requires accurate, accessible and integrated ecological and 
human use data in order to base spatial planning on sound science. 
Whenever possible, these data are needed at spatial and temporal 

scales that are in line with management decisions, and need to provide continuous information over the whole 
management domain. With these requirements in mind, over the past year New York has compiled diverse 
ecological and human use datasets, including: biogeographic data from The Nature Conservancy's Northwest 
Atlantic Marine EcoRegional Assessment (NAMERA); distributions of marine fishes, marine mammals and 
sea turtles from Stone Environmental Inc., the University of Rhode Island, the New England Aquarium, and 
the National Marine Fisheries Services' Northeast Fisheries Science Center; infrastructure data, chiefly from 
the NOAA electronic navigation charts; jurisdictional information downloaded from the Multi-purpose Marine 
Cadastre (MMC), a tool developed in collaboration between NOAA Coastal Services Center (CSC) and 
DOI's Bureau of Ocean Energy Management (BOEM - formerly the Bureau of Ocean Energy Management, 
Regulation and Enforcement, BOEMRE), and; offshore human use information collected through participatory 
geographic information system workshops developed and carried out in partnership between the New York 
State Coastal Management Program and CSC. 

This report supplements other datasets and reports compiled by OMAFRA's Great Lakes Program (OGLP), 
and provides data identified by OGLP as a priority to satisfy the needs of a Coastal Management Program 
amendment in the Atlantic. Specifically, this report examines the spatial distribution of: seabirds, bathymetry, 
surficial sediments, deep sea corals, and dynamic oceanographic habitats. We developed new geospatial 
synthesis products with the objective of providing: 

• The most accurate and up-to-date information available, 

• Continuous information over the management domain and at the finest spatial scale raw data would 
support, 

• Estimates of synthesis product reliability (certainty) and assessments of data quality, 

• Data products in digital formats that allow easy integration with other datasets in a geographic 
information system, and 

• Maps, assessments and interpretations that are easily understood and used by coastal managers 
to support spatial management decisions. 



All data and assessments in this report represent a synthesis of existing information rather than a new data 
collection effort. Given the short time frame over which management decisions frequently need to be made and 
omnipresent budget constraints, this approach of interest to be one other coastal zone managers. 

1.3. AN ANALYTICAL APPROACH USEFUL TO SPATIAL PLANNING 

The ocean area offshore of New York has a significant amount of raw data, ranging from sediment samples 
to bird observations to ocean temperature profiles. But many of these datasets are spatially and temporally 
limited or exist only as scattered points. As such, they are difficult to use for spatial planning, especially when 
decisions must be made in locations that are in-between surveys, have few surveys, have widely varying 
measurements or require a regional context. Where possible, we overcame these challenges by using a 
spatial analytical approach which applied statistical modeling to generalize from scattered sets of data points 
to regional maps of important patterns and processes. 

Not all data can support this type of spatial analytical approach, especially datasets with few observations 
and/or with unknown sampling effort. For instance, predictive coral and sponge distribution models could not 
be developed in this report (Chapter 5) due to these data limitations. In this case, the goal was not to make 
spatial predictions, but rather to compile the most up-to-date observations and develop maps providing the 
best available information to make management decisions. 

In the remaining chapters, datasets for bathymetry (Chapter 2), surficial sediments (Chapter 3), dynamic 
oceanographic habitats (Chapter 4), and seabirds (Chapter 6) included sufficient information to develop reliable 
spatial models. In-depth discussions of the statistical methods used to convert observation point data into 
continuous surfaces are available in corresponding chapters. A generalized representation of the approach 
using actual data (common loon sightings) is presented in Figure 1.1. 

The spatial analytical approach follows Cressie (1993) and Hengl et al. (2007), where the variables of interest 
are modeled as a linear combination of components representing a deterministic mean trend, a spatially 
structured random process, and non-spatially structured error. The deterministic mean trend is estimated using 
a suitable broad spatial-scale function (generalized linear model for seabirds, or a smoothing function for 
bathymetry and surficial sediments) and the spatially structured random process and error term are estimated 
by geostatistical analysis of the residuals. There is no loss of information in this approach since the residuals 
contain all of the information removed from the trend surface. 

The result is a spatially-explicit distribution of predicted outcomes, whether the outcomes are of abundance 
or the likelihood of occurrence. This predicted distribution of outcomes has two uses. First, the average taken 
from of the distribution can be mapped and used to represent the most likely outcome for a given location. 
Second, the distribution provides an estimate of certainty for the mapped outcome. That is, the mapped 
prediction for an area with a narrow distribution (outcomes are similar) has greater certainty, than the prediction 
for an area with a wide distribution (outcomes are dissimilar). Knowledge of a prediction's certainty is a useful 
measure in spatial planning, because it allows planners to use the best available data to make decisions with 
an understanding of limitations on generalizations that can be made from the available data. We use the terms 
reliability, certainty and uncertainty throughout this report. 

The applied spatial predictive methods involve a number of statistical assumptions, and it is important to 
note that the accuracy of model predictions and estimates of certainty depend to varying degrees on these 
assumptions being met. A complete discourse on all statistical assumptions is beyond the scope of this report 
(for detailed discussions see the methodological citations in each of the individual analytical chapters of this 
report), but several general assumptions are: 

• Spatial patterns and sampling effort are constant over the analyzed timeframe 

To compile sufficient data to make predictions we integrated data over several years. This approach 

provides information on the long-term average state of the system, but ignores long-term trends or cycles. 
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F/gi/re 1 1: These four panels show the general analytical approach used in this report to develop continuous distribution maps, assess 
certainty, and make easily understood products from typical survey data. This example uses data from the Manomet Bird Observatory 
Seabird and Cetacean Assessment Program database. Panel A shows common loon sightings distributed across the study area. A 
clear spatial pattern is difficult to discern, since sampling effort is irregular and observed presences are dispersed among observed ab- 
sences. This is a typical ecological pattern since seabirds move around, detection is not perfect and sampling effort is irregular. Panel 
B shows the continuous output from a predictive model which has linked observations of the common loon to environmental predic- 
tors such as sea surface temperature, depth and oceanographic productivity. The model displays the average likelihood of observing 
a common loon given these environmental linkages and fills in gaps where survey data is missing. Panel C displays the uncertainty 
related to the predictive model, where areas of most uncertainty indicate the greatest range in possible predicted outcomes. Model 
uncertainty is commonly greatest where the resource of interest is most variable or where few data are available. Panel D shows a map 
where certainty (the inverse of uncertainty) is draped over predicted relative abundance. This type of map was requested by coastal 
resource managers in OGLP, because it was easy to understand and use for spatial management decisions. 



• Resources and species are precisely detected and measured 

Species or resources are seldom perfectly detectable, meaning corresponding occurrence and abundance 
estimates will be biased compared to true abundance and occurrence values. When sampling effort 
is known and heterogeneous, values can be standardized by effort to allow relative comparisons, but 
difficulties still arise in assessing areas where little sampling effort was devoted. 

• There exists a constant relationship between sampling effort, relative indices of occurrence 
and abundance, and true values of occurrence and abundance 

Not only are species and resources unlikely to be perfectly detectable, the relationship between relative 
indices of occurrence and abundance and the true values of occurrence and abundance could vary in 
time and space, depending on differences in observers, weather conditions, animal behavior, etc. Such 
variation introduces an unaccounted for source of measurement error into data, and it is not possible to 
correct for all such sources of variation. 

In addition to the assumptions inherent in modeling techniques, maps and assessments are a reflection of 
data quality and we assume that the data quality is suitable for spatial modeling and are representative of 
the ecosystem's true state. The key challenges of using existing data are that it was collected for a specific 
purpose, which may not be congruent with spatial analysis, and by definition it was collected in the past. It is 
important to understand potential limitations inherent to each dataset, and in each chapter we have identified 
and assessed key data quality issues. 

We understand that statistical and data quality assumptions may not be completely met, thus model validation 
is an important part of the modeling approach. Validation is usually done by cross-validation, a process in which 
some data are left out of model fitting and model predictions are tested against those data. Model predictions 
can also be tested against high-precision "ground-truth" datasets where such datasets are available. We use 
both methods to validate predictions and maps in this report. 

1.4. DESCRIPTION OF THE STUDY AREA 

This report focuses on a study area in ocean waters off the coast of New York. The area covers a portion of the 
Mid-Atlantic Bight and much of the area characterized as the New York Bight. The study boundaries extend 
from the southern shores of Long Island to the edge of the continental shelf and from Nantucket Shoals to the 
shores of New Jersey (Figure 1 .2). Both state and federal waters are included. 

The study area covers a "spatial planning area" chosen by the OGLP in which they will focus their planning 
efforts, as well as ocean waters immediately adjacent to the planning area. The spatial planning area includes 
New York's territorial sea and Federal waters where natural phenomena and human activities can affect 
services and resources within the territorial sea. 

The majority of the study area is characterized by a broad continental shelf approximately 150-200 km wide. 
At its outer edge, the shelf meets the continental slope, an area 40-60 km wide with very steep slopes and 
that extend to depths greater than 2 km. The most prominent topographic features in the study are the Hudson 
shelf valley, which crosses the entire shelf, and several shelf edge incisions made by submarine canyons. 
These topographic features alter the broad-scale hydrography of the region, are important to cross-shelf water 
movement and provide important benthic habitats which differ from the surrounding seascape (Cooper, 1987; 
Steimleetal., 1999). 

The seafloor on the shelf is composed of mostly sand which grades to silt and clay in deeper areas (Poppe et 
al., 2005). The relatively homogenous seafloor has sporadic relic sand and gravel ridges; exposed sandstone 
and bedrock, dumping sites, dredge disposal sites and artificial reefs (i.e., shipwrecks, lost cargo, submerged 
pipelines). Bottom sediments play critical roles as habitats for benthic organisms such as demersal fish, clams 
and corals, and in storage and processing of settling organic matter. 
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Figure 1.2: A map of the study area used in this report. Map produced by New York State Department of State. Note that, effec- 
tive October 1, 2001, the Bureau of Ocean Energy Management, Regulation and Enforcement (BOEMRE) was renamed to the 
Bureau of Ocean Energy Management (BOEM). 



The hydrography of the study area is characterized by a strong seasonal cycle, considerable freshwater input 
from rivers, storm dominated sediment transport and interactions among large distinct water masses which 
extend across the Northwest Atlantic (Townsend et al., 2006). These hydrographic characteristics, along 
with characteristics of the seafloor and geomorphological setting produce patterns across multiple spatial 
and temporal scales in resources (e.g., fish, sand, renewable energy) and ecosystem services (e.g., coastal 
protection, tourism and transportation). 
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Bathymetry 
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2.1 SUMMARY 

A new bathymetric model with spatially-explicit uncertainty estimates was developed for the New York study 

area (Figure 1.1). The model builds on previous predictive bathymetric modeling approaches in the region (e.g., 

Calder, 2006), provides a continuous gridded bathymetric surface for the study area, and allows users to view 

and explore spatial variation in the vertical accuracy of depth predictions. The spatial resolution of the model 

is identical to the National Oceanic and Atmospheric 

Administration's (NOAA) Coastal Relief Model (CRM; 

horizontal resolution approximately 83.8 m) in the 

study area and was built from the same database of 

hydrographic survey points. Unlike the CRM, the new 

geostatistical model provides estimates of prediction 

certainty, which can be used to prioritize locations 

for new bathymetric surveys and better understand 

the reliability of depth predictions and derived spatial 

layers (e.g., benthic habitats, positions of depth 

contours). 

2.2 BACKGROUND 

Bathymetry (also called seafloor topography) is 
an important base environmental layer for spatial 
planning since it influences both planning of human 
activities (e.g., construction, shipping) and many 
physical, chemical and ecological processes. For 
instance, reliable bathymetric information can 
simultaneously improve habitat conservation and 
energy development by supporting the identification 
of: 

• Unique or vulnerable benthic habitats 

• Distributions of rare or endangered species 

• Efficient corridors for transmission lines 

• Suitable sites for turbine platforms, and 

• Potential construction hazards 




Image 2.1. An example of a bathymetric surface in the New York 
Bight, showing the change in depth with distance from shore and the 
complexity of the seafloor across the shelf. The Hudson Shelf Valley 
is prominently visible in the center of the model as the area of darker 
blue extending from New York harbor (top left) towards the shelf edge 
(bottom right). Coastal managers and engineers use bathymetric 
surfaces to assess shipping lanes, identify fish habitats, lay undersea 
cables and find sand and gravel resources. The bathymetric surface 
shown here is the the NOAA Coastal Relief Model (CRM), draped over 
a derived hillshade layer to highlight bathymetric variation. Terrestrial 
imagery is the ArcGIS Online World User Imagery layer (ESRI Online). 



Bathymetry can be measured by a range of instruments, which determine the precision, spatial resolution, 
extent and cost of bathymetric information and nautical charts. Until the latter half of the 20th century, lead 
lines dropped from a ship were used to estimate depths (Calder, 2006) and were compiled on charts to give 
a coarse-scale representation of the seafloor and identify navigation hazards. Lead lines were eventually 
replaced by more accurate vertical beam echosounders (VBES) and subsequently by multibeam echosounders 
(MBES). Modern MBES can collect millions of precise soundings efficiently and quickly, making possible high- 
resolution bathymetric maps that reveal fine-scale features of the seafloor (Calder, 2006). Horizontal positioning 
technologies have also advanced over the years from sextant-based navigation to modern GPS. 

When combined with backscatter information and validation samples, MBES data offers an unprecedented 
view of the composition and morphology of the seafloor at multiple spatial scales (Kostylev et al., 2001 ; Gardner 
et al., 2003). The States of Oregon and California recently collected new data to take advantage of insights 
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provided by MBES and these data have been integrated into state spatial planning initiatives, specifically the 
Oregon Territorial Sea Plan and California Marine Life Protection Act Initiative. On the East Coast, fewer states 
have comprehensive MBES coverage; this may be a reflection of the increased costs involved in surveying 
comparatively wide, shallow continental shelves. 

About 20% of the New York planning area is covered by MBES surveys (Figure 2.1) which have been collected 
by the United States Geological Survey (USGS), NOAAand the Woods Hole Oceanographic Institution (Schwab 
et al., 1997a and 1997b; Butman et al., 1998; Goff et al., 1999; Butman et al., 2006). The corresponding 
data have helped researchers map benthic habitats and identify physical features on the seafloor within the 
footprints of surveys. 



Ocean Plan and Rhode Island Special Area Management Plan. Although coarser than multibeam data, the -84 
m horizontal resolution of the CRM is still sufficient to resolve general features of interest for marine spatial 
planning (e.g., canyons, ridges, sand waves, bathymetric contours). 

The portion of the CRM that overlaps the New York planning area was produced in 1 999 and is a compilation of 
historical hydrographic surveys, collected using VBES and MBES from various data sources, including NOAA, 
USGS, the U.S. Army Corps of Engineers, and various academic institutions. Although compiled surveys are 
brought together under a common spatial framework, they possess different spatial footprints and resolutions, 
and were collected using different instruments. Newer surveys commonly overlap, adjoin and supersede older 
surveys. 



Unfortunately, the incomplete distribution of multibeam surveys limits their usefulness for understanding 
the relative distribution of habitats, features, processes and species over the entire planning area, a critical 
component of integrated marine spatial planning. The U.S. Coastal Relief Model developed by the National 
Geophysical Data Center (NGDC; http://www.ngdc.noaa.gov/mgg/coastal/model.html) offers a 3-arc second 
continuous bathymetric model that covers the majority of the study area (including all of the continental shelf 
and slope). The CRM is derived from the largest single compilation of bathymetric soundings for US coastal 
waters and has been used effectively to inform spatial planning on the East Coast as part of the Massachusetts 
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Generally, the CRM is used in resource management applications assuming the depth measurements and 
predictions to be accurate, but significant uncertainty in model depth estimates arises from measurement error 
in hydrographic surveys, methods used to interpolate between survey points and data processing (discussed 
in detail in Calder, 2006 and references therein). These errors are variable over the study area (Figure 2.2) 
due to various factors, including survey age, processing guidelines, and distances amongst soundings. Since 
bathymetric errors in the CRM are not quantified, users cannot know whether depth predictions at a given 
location are likely to deviate from the true value by a few centimeters or hundreds of meters. Disregarding 
uncertainty might be acceptable for some analyses conducted at coarse spatial resolutions, but is problematic 
for finer resolution analyses and when precise measurements are needed. Knowing where bathymetric 
predictions are precise and where they are not provides managers with information to define and manage risk 
associated with decisions relying on bathymetry or derived products (e.g., defining benthic habitats, estimating 
construction costs, placing shipping lanes). 

2.3 METHODS 

2.3.1 General Modeling Approach 

Ageostatistical modeling approach was used to predict a continuous, gridded bathymetric surface from scattered 
sounding points and to generate corresponding spatially-explicit uncertainty estimates. Geostatistical methods 
are based on the premise that neighboring samples are more similar than samples farther away (Tobler, 1 970), 
a phenomenon known as spatial autocorrelation. Spatial autocorrelation can be detected, quantified and 
modeled by semivariogram analysis, and used to make predictions at locations that have not been measured 




Figure 2.1. Spatial extent of selected multibeam and sidescan sonar surveys in the study area. Survey boundaries are overlaid on 
bathymetry data from the Coastal Relief Model blended with the ETOP01 Global Relief Model (Amante and Eakins 2009). 
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Figure 2.2. (a) Most recent survey year for soundings within 1 km rectangular neighborhoods. Survey year classes generally correspond 
with the evolution of horizontal positioning and vertical sounding technologies. More recent soundings tend to be more precise, (b) The 
number of bathymetric soundings per square kilometer. The shelf edge corresponds to the 200 m depth contour. 



(Cressie, 1993; Chiles and Delfiner, 1999). In addition, the same spatial model used to develop predictions can 
be used to model uncertainty (i.e., expected precision) of predictions. 

The geostatistical modeling approach used here follows Cressie (1993), where estimates of depth for a given 
location, Z(x,y), are modeled as a linear combination (sum) of components representing a deterministic mean 
trend, \i(x,y), a spatially structured random process, 5(x,y), and non-spatially structured error, z. 



Z(x,y) = |j(x,y) + 5(x,y) + £ 



(Equation 2.1) 



Equation 2.1 defines the workflow used to arrive at Z(x,y). The deterministic mean trend and spatially structured 
random process with error term are modeled separately and then combined by summation (see Figure 2.3 
for schematic representation of work flow). The deterministic mean trend, v(x,y), is estimated using a suitable 
smoothing function and residuals of original data from this smoothing function are computed at the data 
positions by subtracting the trend prediction from the observed data value. The spatially structured random 
process, 5(x,y), and error term, £, are then estimated by fitting a suitable semivariogram model to the empirical 
semivariogram of the residuals. The error term is defined by the semivariogram nugget and represents error 
that is not spatially correlated, which includes both measurement error and variability occurring at spatial 
scales shorter than the sampling resolution (Cressie, 1993). 
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Figure 2.3. General workflow describing the geostatistical approach used to develop the predictive model for bathymetry (see sections 
2.3.3 and 2.3.4 for a more detailed description of the methods). 



Geostatistical models involve a number of statistical assumptions (for detailed discussions see Cressie, 1993; 
Chiles and Delfiner, 1999). The accuracy of model predictions and uncertainty bounds depends to varying 
degrees on these assumptions being met. Thus, an important part of any geostatistical analysis is model 
validation, which is usually done by cross-validation, a process in which some data are left out for purposes of 
model fitting and model predictions are tested against those data. Model predictions can also be tested against 
high-precision "ground-truth" datasets where such datasets are available. We use both methods to validate 
model predictions in this chapter (see Section 2.3.4). 

2.3.2 Data Acquisition and Preparation 

Raw Sounding Data 

To develop a new geostatistical bathymetric model for the NY Bight, all available National Ocean Science 
(NOS) Hydrographic Survey Data overlapping the study area were downloaded from the National Geophysical 
Data Center Hydrographic Survey Database (http://www.ngdc.noaa.gov/mgg/bathymetry/hydro.html) on April 
21, 2011. Survey measurements and metadata were extracted from raw HYD93 formatted files and exported 
into plain ASCII text tabular files using custom parsing scripts. Depths are represented as positive numbers 
with increasing depth below sea level defined by the Mean Lower Low Water (MLLW) vertical datum. Sounding 
data were merged with survey metadata, so that information detailing when and how each sounding was 
collected was retained. Sounding locations, originally in decimal degrees (NAD83 datum), were projected into 
a Universal Transverse Mercator projection (UTM 18N), since subsequent processing requires measurement 
of point-to-point distances in a Cartesian coordinate system. UTM 18N has its central meridian at 75°W and 
thus allows calculation of distances in our study area with negligible distortion relative to grid resolution (83.8 
m). 

Hydrographic soundings in the study area come from a multitude of surveys distributed between 1 887 and 2004. 
Surveys used an assortment of positioning and sounding technologies, resulting in a patchwork of overlapping 
soundings collected with variable sample spacing and different precisions (Figure 2.2). In addition, survey data 
was processed using varying methods which created varying post-processing errors (see Calder, 2006 for a 
full discussion). These errors can propagate to the final model creating distortions that do not correspond to 
changes on the seafloor. While steps have been taken to partially correct for and reduce the impact of these 
data quality issues, it is important to understand that they cannot be entirely eliminated. No bathymetric model 
based on historical hydrographic sounding data will be completely free from such considerations. 

In general, the vast majority of soundings were retained to maximize data density. However, some soundings 
were corrected or eliminated prior to modeling. First, based on Calder (2006), we applied a +1 .48 m correction 
to all soundings collected by lead line to correct for the observed bias of lead line soundings compared to 
multibeam sonar measurements. Second, lead line and VBES surveys were identified that showed evidence of 
quantization due to rounding to the nearest whole fathom (resulting in an error of up to 1 .8 m). Data from these 
surveys were eliminated when they were located within the footprint of more accurate surveys (i.e., surveys 
that did not exhibit quantization). Survey footprints were hand digitized at 1:50,000 in ArcGIS 10 (ESRI, 2011). 
Different types of rounding and conversions created surveys with varying degrees of quantization, but only 
those surveys with the most severe fathom-rounding quantization of 1 .8 m were eliminated from analysis, and 
those only when more recent information was present. Other types of quantization are expected to result in 
errors less than 1 m. Other sources of error in raw soundings, including un-accounted for changes in vertical 
and tidal datums, are expected to be small (on the order of 10's of centimeters) and are discussed in detail in 
Calder (2006). 

Depth Stratification 

The resulting hydrographic sounding database was divided into four strata based on depth thresholds (Table 
2.1). Thresholds were chosen based on depth ranges that correspond to different maximum uncertainty 
specifications under International Hydrographic Organization (IHO) standards (S.44 Order 1 and 2, IHO 1998) 
and on coarse-scale changes in geomorphology (e.g., the continental shelf break). Neighboring strata overlap 
slightly to facilitate merging of outputs from individual strata into a continuous surface. 



Table 2. 1. Depths used to stratify hydrographic soundings and the corresponding number of soundings within each stratum. 



ncoxu ctdati ,» NUMBER OF SOUNDINGS 
DEPTH STRATUM (EXCLUD|NG OVERLAP) 


PERCENTAGE OF 
SOUNDINGS 


ADDITIONAL PERCENTAGE 
SOUNDINGS OF SOUNDINGS 
FROM OVERLAP FROM OVERLAP 


0-30 m 


2,077,055 


83.9% 





0% 


30-100 m 


337,238 


13.6% 


176,843 


34.4% 


100-200 m 


21,932 


0.9% 


9,931 


31.2% 


200-2,000 m 


40,196 


1 .6% 


3,517 


8.0% 


Total 


2,476,421 


100% 


190,291 


7.1% 



Transformation 

Prior to statistical modeling, depth values were transformed using the following logarithmic function to normalize 

error distributions: 



= log(Z*b+a) 



(Equation 2.2) 



where Z is depth in meters (with positive numbers representing depth below sea surface) and the transformation 
parameters a and b are taken from the appropriate error model for each depth stratum identified by IHO 
standards (a = 0.5 m,b = 0.013 when Z< 100 m [IHO S.44 Order 1] and a = 1.0 m, b = 0.023 when Z >= 100 
m [IHO S.44 Order 2]) (IHO, 1998). This transformation was based on a standard bathymetric error model 
formulation (IHO, 1998; Calder, 2006), and improves homogeneity of conditional error variances within local 
regression and kriging neighborhoods, a desirable statistical property. 

2.3.3 Development of the Bathymetric Model 

The deterministic mean bathymetric trend surface was estimated using LOESS, a semi-parametric local 
regression technique (Cleveland and Devlin, 1988). LOESS estimates a smooth trend surface using weighted 
least-squares regression in local neighborhoods defined by a fixed number of points closest to each prediction 
location (the span, measured as a percentage of the total number of data points). Specifically, quadratic 
LOESS was used with a span of 1%, corresponding to an average neighborhood width of between 3 and 12 
km depending on point density. 

LOESS was implemented in Matlab version 7.13 (R2011b) with the Curve Fitting toolbox (The MathWorks Inc., 
Natick, MA). The standard Matlab toolbox function (curvefit/curvefit/+curvefit/LowessFit.m) was modified to 
reduce processing times and increase matrix stability. Execution speed was improved by using k-dimensional 
search trees (KD-Tree for MATLAB, Tagliasacchi, 2011) to identify and sort soundings in each local neighborhood. 
Under certain conditions local regression methods such as LOESS can exhibit instability due to limits on the 
precision of matrix calculations. X and Y coordinates were centered and re-scaled to minimize the possibility 
of matrix stability problems. The condition number of each local regression design matrix was also evaluated 
to diagnose areas where matrix precision might affect the accuracy of regression fits. 



spikes due to the nugget effect. Filtering the nugget effect in this way is similar to the maximum a posteriori 
resampling technique proposed for filtering noisy bathymetric data by Goff et al. (2006). 

The parametric standard error of the mean trend was estimated using a Monte Carlo approach. Specifically, 
the approximation method of Durban, et al. (1999) was used to estimate the variance-covariance matrix of the 

estimated local regression coefficients, Var(P) . The scale of the variance-covariance matrix was estimated as 
the sum of squares of the residuals for the whole model (i.e., the residuals of the original data from the LOESS 
fit at all data points), divided by (N - A), where N is the number of observations and A is the effective number 
of parameters, estimated as, A = 2*(1+[N/(N*span)]). 

Regression coefficient vectors were simulated by 1,000 draws from a multivariate normal distribution defined 

by mean vector p and covariance matrix Var(p) , and the LOESS prediction was re-calculated for each 
simulated regression coefficient vector. The standard error was estimated as the standard deviation of the 
simulated LOESS predictions at each location. The condition number of the design matrix of each local 
regression was also recorded as an additional diagnostic measure. 

Residuals were obtained by subtracting the LOESS trend surface prediction at each data location from the 
observed data value. Semivariograms of residuals were then calculated and modeled in ArcGIS 10 with 
the Geostatistical Analyst extension (ESRI, 2011). A separate anisotropic semivariogram model was fit 
independently for each depth stratum (Figure 2.4, Table 2.2). The nugget effect was adjusted manually based 
on visual inspection and prior expectations from measurement error models (see below). The rest of the model 
parameters, including anisotropy ranges and direction, were fit automatically using non-linear weighted least- 
squares (ESRI, 2011). 



LrUUUU££J"F€JQpt*UUl 1 mil 



iltl-.-bbijS^hri-^n 



: JJJLJL>2£b-~rcmiQ-U -JUU-U^Jn JHfiT^ri- * ^_13,*= L. 



IEZ3I 







! 



... 



. ' " ■" . . . . . m L2 J - 

■ ■ -■ ' ■- J-.- . . . ■ •t* f ]fz Am "■ 



H 



H3 rr 

C:>XL7S , M,3^l+::D^=yLai«tmi.Fre6J^«i3a,T71.7T- 




> 



ItettnjmFTiOTri-CIKiirc'iflLTar* lc£2.2,13£BA1ia5t 



Local regression matrix stability was problematic when points that were very close to each other had very 
different values, which occasionally occurred in areas of high sounding density and resulted in gaps in the 
LOESS prediction surface. To eliminate gaps, soundings within a horizontal distance of ± 10 cm were identified, 
grouped and then dispersed with a small random nudge. Coordinates for the first occurrence of a sounding in 
a group were retained and subsequent coordinates were shifted by adding a uniform random number in the 
range ± (0.5,1.5 m). Displacements were only accepted if they did not create conflicts (within 10 cm) with other 
soundings. A total of 700 soundings were modified (<0.03% of all data). Although this dispersion adds some 
positional error to each sounding, the displacement is negligible in relation to other sources of positioning error 
caused by geographic positioning systems or ship heave/pitch/roll. 

A similar displacement procedure was applied to soundings that fell within 10 cm of the prediction grid 
coordinates. The purpose of this was to ensure that measurement and micro-scale error were filtered out of 
the prediction surface, because at the precise locations of original data, the kriging prediction surface exhibits 
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Figure 2.4. Estimated semivariograms of residuals and fitted semivariogram models for each depth stratum, (a) 0-30 m, (b) 30-100 
m, (c) 100-200 m, (d) 200-2,000 m. Red dots represent sample semivariance values, blue crosses represent averaged semivariance 
values, solid blue lines represent directional semivariogram model fits. 



Table 2.2. Semivariogram parameters for each depth stratum. 



DEPTH 

STRATUM 

(m) 


n 


TYPE 

OF 
VARIO- 
GRAM 
MODEL 


NO. 

OF 

LAGS 


LAG 

SIZE 

(m) 


NUGGET 
x10- 5 (m) 


MAJOR 

RANGE 

(m) 


MINOR 

RANGE 

(m) 


DIREC- 
TION 

(T 


PARTIAL 
SILL 
x10" 3 


% OF THE 

SILL DUE 

TO THE 

NUGGET 


0-30 


2,077,055 


Exp 


100 


100 


2.83 


1,615 


963.25 


64.86 


11.8 


0.2 


30-100 


514,081 


Gau 


100 


100 


2.46 


480 


180.55 


45.00 


0.19 


11.5 


100-200 


31,863 


Gau 


175 


58 


8.78 


1,787 


442.38 


171.74 


0.488 


15.2 


200-2,000 


43,713 


Gau 


180 


56 


150.0 


1,882 


1,358.12 


119.88 


5.91 


20.2 



Prediction, ±1 standard error, and 95% confidence interval surfaces were back-transformed using the equation: 



Exp= exponential; Gau= Gaussian; *Clockwise from North 

Nugget selection was guided by a Monte Carlo simulation of measurement error expected for each depth 
stratum based on maximum measurement error models defined by IHO standards (IHO S.44 Orders 1 and 2). 
We simulated depth observations with measurement error across the depth range of each stratum (n=1 00,000 
points), log-transformed the simulated depths using Equation 2.2, calculated residuals from the recorded 
depth, and calculated the depth-averaged measurement error for each stratum in the log-transformed space. 
This served as a lower bound on the nugget for semivariogram fitting, which was then adjusted higher if 
necessary based on the best fit to empirical semivariogram plots. The rationale for this approach is that the 
minimum value of the nugget is equal to measurement error. Small-scale spatial features not resolved by the 
sample spacing (so-called "micro-scale structures") can add to this error and raise the value of the nugget, but 
not lower it. 

To perform ordinary kriging of residuals, semivariogram model parameters fitted in ArcGIS 10 (ESRI, 2011) 
were input into the KT3D module of GSLIB (Geostatistical Software Library, Deutsch and Journel, 1992). 
KT3D was used instead of ArcGIS 10 because of the 



prohibitively slow computational speed of the ArcGIS 
kriging implementation. KT3D was run with ordinary 
kriging, 8-sector search neighborhoods, a minimum 
search radius necessary for a gap-free kriging 
prediction, and a maximum search radius equal to 
the minimum radius times the anisotropy ratio. At 
least 1 and no more than 80 points (a maximum of 
10 from each sector) were used to produce each 
kriging prediction. Table 2.3 provides more detailed 
information for search neighborhood parameters by 
stratum. 



Table 2.3. Search neighborhood parameters by depth stratum. 



DEPTH 

STRATUM 

(m) 


MINIMUM 

SEARCH 

RADIUS 

(km) 


MAXIMUM 

SEARCH 

RADIUS 

(km) 


SEARCH 
ELLIPSOID 
ANGLE O* 


0-30 


3.353 


2.0 


64.86 


30-100 


5.317 


2.0 


45.00 


100-200 


12.117 


3.0 


171.74 


200-2,000 


6.929 


5.0 


119.88 



*Clockwise from North 



At each grid location for which sufficient data existed to produce a kriging prediction, the LOESS trend surface 
was evaluated and estimates of LOESS prediction standard error and condition number were produced. The 
kriging prediction, kriging variance, LOESS prediction, LOESS variance, and LOESS condition number were 
exported from Matlab and GSLIB formats to ESRI GRID format for post-processing using the Spatial Analyst 
extension in ArcGIS 10 (ESRI, 2011). 

The model surface representing predicted depth for each stratum was then calculated as the sum of the 
LOESS and kriging prediction surfaces (see Equation 2.1). The corresponding prediction variance surface 
was calculated as the square root of the sum of the LOESS variance and kriging variance estimates. This 
calculation of the total variance assumes that the spatially structured random error component (6 and £ in 
Equation 2.1) is uncorrelated with the mean component (|j). The prediction variance was used to construct ±1 
standard error and 95% confidence interval surfaces (using the standard normal distribution critical value of 
1.96). 



(Exp(Z 1 



transform 



)-a)/b 



(Equation 2.3) 



where Z, 



transform 



is the depth prediction in transformed units and a and b are the error model parameters described 



for Equation 2.2. 

Finally, the separate surfaces representing predicted depth and uncertainty for all four strata were mosaicked 
to generate seamless surfaces covering the whole study area. At locations with more than one prediction 
(i.e., where strata overlap), values for the mean (or variance) were calculated by a weighted average, where 
weights corresponded to the inverse of prediction variance (normalized by the sum of the weights for all the 
depth strata). 

2.3.4 Model Validation and Accuracy Assessment 

Cross-validation 

A cross-validation exercise was carried out for each depth stratum to assess the accuracy of the geostatistical 
modeling approach. For purposes of this exercise, 50% of the data points in each stratum were selected at 
random for inclusion as "training data", with the remaining points held out as "validation data." Models were 
developed following the methods above applied only to the training data, and predictions were evaluated at the 
validation data locations. The values of the mosaicked prediction and final mosaicked prediction ± 1 standard 
error surfaces built from the training dataset were extracted at the validation point locations and cross-validation 
error statistics (Mean Average Error [MAE], Mean Average Percentage Error [MAPE], and Root Mean Square 
Error [RMSE]) were calculated. Since the final model was produced using the entire dataset, two times larger 
than the training dataset, these cross-validation statistics represent a conservative upper bound on the error 
statistics of the final model. 

Independent Accuracy Assessment 

In addition to cross-validation, geostatistical model predictions were also compared to depth predictions of 
the CRM (described in Section 2.1) and to a multibeam dataset, hereafter referred to as the STRATAFORM 
survey. The STRATAFORM survey collected soundings offshore of New York and New Jersey around 39° 
12'N 72°50'W as part of the STRATAFORM project using an EM1000 MBES (Mayer et al., 1999; Nittrouer, 
1999; Goff et al., 1999). The STRATAFORM survey covered 2,500 km 2 of seafloor in water depths ranging 
from 20 to 400 m and provides depth estimates for a contiguous surface at 10 m gridded resolution. Although 
MBES data does contain some error, for our purposes we consider it to represent the "ground truth" since 
accuracy and resolution of MBES surveys is much better than interpolated and compiled archival hydrographic 
surveys. To facilitate comparison, our 83.8 m model grid was overlaid on the 10 m STRATAFORM grid and 
the mean STRATAFORM values in each model grid cell were calculated. The geostatistical model and CRM, 
were compared to the STRATAFORM survey within the STRATAFORM survey footprint by calculating mean 
difference (bias), MAE, MAPE and RMSE. Comparison statistics were calculated for the entire area of overlap 
and for depth strata within that area (30-100 m, 100-200 m). 

2.4 RESULTS AND DISCUSSION 

2.4.1 Bathymetry Model Predictions 

The new bathymetric model extends over the continental shelf and across the shelf slope, covering the 
majority of the planning area (Figure 2.5). Depth predictions ranged from m at the shore to around 2,100 m 
on the shelf slope. Some nearshore areas, like the approach to New York Harbor, were not modeled due to 
processing limitations arising from extremely high data density. A few small patches in the nearshore areas 
off southern New Jersey and western Long Island did not have model predictions because the geostatistical 
model was unable to produce predictions where soundings with distinctly different measured depths occurred 
at virtually the same location. This occurred where older, less accurate surveys coincided with more recent, 
more accurate surveys. 
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F/gure 2.5. Bathy metric surface developed from the geostatistical model, draped over separate hillshade layers derived for the 
continental shelf (0-200 m) and the shelf slope (>200 m). Solid black lines depict depth contours derived from the bathymetric surface. 



Figure 2.6. Estimated error of predictions (standard error) from the bathymetric model. Model standard error provides an indication of 
prediction uncertainty. 



Duane et al. (1972) found that sand ridges were a dominant geomorphologic feature on most of the northeast 
U.S. Atlantic continental shelf. These features were evident in the bathymetry model, particularly to the west 
of Hudson Canyon and in the northeast of the study area. Submarine canyons, like Hudson Canyon, and 
shallower networks of gullies were also evident in the model along the shelf slope. 

2.4.2 Bathymetry Model Uncertainty 

Model standard error ranged from 0.026 m to almost 200 m over the study area (Figure 2.6). In general, 
model standard error was less than 5 m at depths shallower than 30 m. In this depth stratum, standard 
error was relatively higher (2-10 m) in areas where surveys occurred prior to 1958 (Figure 2.2). At depths 
from 30 m to 100 m, standard error was typically less than 2 m, but reached as much as 5 m in some areas 
where depths approached 100 m. Standard error typically ranged from 2-5 m for depths between 100 and 
200 m but was higher (5-10 m) in some areas where depths approached 200 m at the shelf edge. Along the 
shelf slope, standard error increased from 10-20 m at depths closer to 200 m to greater than 50 m in areas 
deeper than 500 m. Although standard error generally increased with sounding depth beyond the 30 m depth 
contour, error was also dependent on distance between surveys. As expected, within each depth stratum, error 
was generally lower along survey transects where distance between soundings was shortest (lines clearly 



distinguishable in Figure 2.6). There were two primary reasons for higher error along the shelf slope when 
compared to shallow areas. First, the absolute precision of sounding instruments generally decreases with 
depth, and second, soundings become sparser farther offshore. There were several areas south of Hudson 
Canyon where sounding tracks are greater than 8 kilometers apart (Figure 2.2). 

The prediction standard error surface indicates the uncertainty associated with the model prediction at each 
location, assuming that statistical assumptions of the model are met. Local regressions can be inaccurate 
as the limits of matrix precision are approached, as indicated by high condition numbers (Figure 2.7). The 
condition number is a diagnostic that indicates the stability of the local regression trend model at each location. 
Higher spatial condition number values indicate that the regression solution is less stable at that location, 
such that small variations in the input data (e.g., uncertainty due to measurement error) can result in large 
variations in the prediction. For second order polynomials, the critical spatial condition number threshold value 
is approximately 100, meaning that predictions should be considered with caution at locations where the 
spatial condition number is close to 100 and should be considered unreliable where it is greater than 100 
(Golub and Van Loan, 1996). Under these conditions, the standard error surface may underestimate actual 
error. This occurs only in a narrow band along the southern coast of Long Island. 
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Figure 2.7. Spatial condition number from the LOESS trend model. Condition number classes reflect the threshold at which standard 
error predictions should be considered unreliable. 

Model uncertainty was also depicted using theoretical 95% confidence intervals of the depth predictions 
(predicted depth ± 1 .96*standard error) along two hypothetical transects (Transect 1 and Transect 2) that 
spanned from the shoreline to the shelf slope (Figure 2.8). Transect 2 differed from Transect 1 in that it 
cut across a submarine canyon (Hudson Canyon) at the shelf edge. Maximum and minimum depth values 
representing the upper and lower bounds of the 95% confidence intervals were extracted at 100 m intervals 
along each transect. For both transects, the width of the 95% confidence intervals generally increased with 
distance from shore and with depth. However, model uncertainty was greater and more variable in the 0-30 m 
depth stratum than it was in the 30-100 m depth stratum (Figure 2.9, Figure 2.10). At depths greater than 200 
m, the 95% confidence interval widths increased dramatically with distance from shore and had an average 
vertical width of almost 0.25 km. 

To explore how uncertainty in depth predictions may translate into horizontal uncertainty and how this uncertainty 
could impact management decisions (e.g., the siting of a wind farm), depth contours (30 m, 50 m, 100 m, 200 
m) derived from the model prediction and from the upper and lower limits of the theoretical 95% confidence 
intervals (±1.96*standard error) were overlaid on the boundaries of theoretical wind farm areas within the NY 
study area (Figure 2.11). These theoretical wind farm areas correspond to areas outside of shipping lanes 
and within current depth constraints for wind farm structures. These depth contours were mapped to produce 
a rough estimate of the horizontal uncertainty associated with the model predictions. Within the potential 



wind farm areas, transects 
were drawn between the 50 
m depth contours derived 
from confidence interval limits. 
The transects were drawn 
approximately perpendicular to 
the 50 m depth contour derived 
from the model prediction at 
intervals of approximately 5 km. 

In this region, the mean 
distance between the depth 
contours derived from the 
confidence interval limits was 
approximately 8 km with a 
standard deviation of almost 
3 km. While this measure only 
represents a rough estimate 
of horizontal uncertainty, it 
suggests that depth predictions 
from this model and other 
models developed using similar 
data should be used with 
caution when high positional 
precision is needed. 




Figure 2.8. Locations of transects used to depict theoretical 95% confidence intervals of the 
geostatistical model predictions as a function of distance from score and local geomorphology. 



2.4.3 Cross-validation of the Training 
Dataset 

Cross-validation results indicated that the 
geostatistical model performed extremely well 
in the 0-30 m and 30-100 m depth strata (mean 
absolute errors 0.60 m and 0.55 m, respectively). 
The model performed reasonably well in the 100- 
200 m depth stratum (mean absolute error of 2.1 m, 
or 1.40%), but model accuracy was considerably 
degraded in the 200-2,000 m depth stratum (mean 
absolute error 25.76 m, or 3.44%) (Table 2.4). 

Cross-validation was also used to assess the 
accuracy of confidence intervals. The theoretical 
68% confidence intervals (model prediction ± 
standard error) are somewhat conservative for 
all depth strata (Table 2.5). For depths below 
100 m, the theoretical 95% confidence intervals 
(model prediction ± 1 .96*standard error) are 
slightly conservative, but the geostatistical model 
underestimates error at depths greater than 100 
m, especially at depths greater than 200 m. For 
example, in the 200-2,000 m depth range model 
standard errors should be multiplied by a factor of 
2.46, rather than the theoretical value of 1.96, to 
produce true 95% confidence intervals (Table 2.5). 



Table 2.4. Cross-validation statistics for the geostatistical model built 
from the training dataset. Negative bias indicates a deep bias while pos- 
itive bias indicates a shallow bias. MAE = Mean Absolute Error, MAPE 
= Mean Absolute Percentage Error, RMSE = Root Mean Square Error. 



DEPTH STRATUM 


COMPARISON 
STATISTIC 


CROSS- 
VALIDATION 
ERROR 


Overall (0-2,000 m) 


Bias 


-0.08 m 


MAE 


1.04 m 


MAPE 


6.53% 


RMSE 


5.53 m 


0-30 m 


Bias 


-0.17 m 


MAE 


0.60 m 


MAPE 


7.72% 


RMSE 


1.24 m 


30-100 m 


Bias 


0.02 m 


MAE 


0.55 m 


MAPE 


1 .42% 


RMSE 


0.96 m 


100-200 m 


Bias 


0.06 m 


MAE 


2.14 m 


MAPE 


1 .40% 


RMSE 


5.48 m 


200-2,000 m 


Bias 


2.97 m 


MAE 


25.76 m 


MAPE 


3.44% 


RMSE 


41.23 m 
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Figure 2.9. (a) Predicted depth (m) with theoretical 95% confidence intervals (± 1 .96* standard error) vs. 
distance from shore (km) along Transect 1. (b) Predicted depth (m) with 95% confidence intervals by 
depth stratum, (c) Distribution of 95% confidence interval widths with the mean (x) and standard deviation 
(sd) for each depth stratum. The probability density of the confidence interval widths was estimated using 
a kernel density function (Sarkka 1999). 
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F/gi/re 2. 10. (a) Predicted depth (m) with theoretical 95% confidence intervals (± 1 .96* standard error) vs. 
distance from shore (km) along Transect 2. (b) Predicted depth (m) with 95% confidence intervals by depth 
stratum, (c) Distribution of 95% confidence interval widths with the mean (x) and standard deviation (sd) for 
each depth stratum. The probability density of the confidence interval widths was estimated using a kernel 
density function (Sarkka 1999). 





Table 2.5. Performance of theoretical 68% and 95% confidence intervals. 



DEPTH STRATUM 


PERCENTAGE OF 

VALIDATION DATA 

WITHIN THEORETICAL 

68% CONFIDENCE 

INTERVAL 


STANDARD ERROR 
MULTIPLIER FOR 
68% CONFIDENCE 

INTERVAL 
(COMPARED TO 1.0) 


PERCENTAGE OF 

VALIDATION DATA 

WITHIN THEORETICAL 

95% CONFIDENCE 

INTERVAL 


STANDARD ERROR 

MULTIPLIER FOR 

68% CONFIDENCE 

INTERVAL 

(COMPARED TO 1.96) 


Overall (0-2,000 m) 


87.50% 


0.42 


95.50% 


1.84 


0-30 m 


87.60% 


0.40 


95.40% 


1.86 


30-100 m 


87.00% 


0.49 


95.90% 


1.77 


100-200 m 


88.30% 


0.35 


94.70% 


2.04 


200-2,000 m 


80.80% 


0.62 


92.70% 


2.46 



2.4.4 Independent Accuracy Assessment 

The STRATAFORM dataset is ideally located for an independent accuracy assessment of the geostatistical 
model because it extends across multiple depth strata and overlaps NOS hydrographic soundings collected 
using multiple sounding and positioning methods. 

As a benchmark, we evaluated our model performance in comparison to the CRM in the STRATAFORM area. 
This may not be an entirely fair comparison, since it is possible (though not verifiable) that the CRM included 
some version of the STRATAFORM data; however, we proceed anyway with the caution that the CRM error 
statistics may be significantly better in this region than in other parts of the study area. 



Table 2.6. Results from an accuracy assessment of the geostatistical model and 
the coastal relief model (CRM). The new geostatistical model and the CRM are 
compared against the STRATAFORM data. Negative bias indicates a deep bias 
while positive bias indicates a shallow bias. MAE = Mean Absolute Error, MAPE = 
Mean Absolute Percentage Error, RMSE = Root Mean Square Error. 



Figure 2.11. Depth contours derived from the predicted bathymetric surface and from the upper and lower limits of the theoretical 95% 
confidence intervals (±1 .96*standard error) overlaid on the predicted bathymetric surface and theoretical potential wind farm areas 
for New York. Within the inset map, approximate uncertainty in depth contour locations was estimated using transects approximately 
perpendicular to the depth contour derived from the predicted bathymetric surface. 



Overall we found that both the 
geostatistical model and CRM are 
excellent models in the 30-100 m depth 
range for the STRATAFORM area 
(mean absolute error [MAE] < 1 m, mean 
absolute percent error [MAPE] < 1.5%), 
but accuracy of both models degraded 
with depth in areas deeper than 100 m 
(Table 2.6). We found the geostatistical 
model did not improve upon the CRM 
in terms of accuracy (when comparing 
MAE, MAPE, and root mean-square- 
error [RMSE]) (Table 2.6), but it was 
able to provide reliable estimates of 
uncertainty, at least for depths less than 
200 m (see Figure 2.6), and obtaining 
these estimates was the principal 
reason for undertaking a geostatistical 
model in the first place. For the 30- 
100 m and 100-200 m depth strata the 
percent correct within theoretical 95% 
confidence intervals were 97.89% and 
97.06%, respectively, indicating that the 
confidence intervals were accurate. 



Results on bias were mixed. In the 30- 

100 m depth stratum, the geostatistical model was approximately unbiased, whereas the CRM exhibited a 
slight shallow bias (+0.55 ). However, in the 100-200 m depth stratum the geostatistical model exhibited more 
of a deep bias (-0.68 m) than the CRM (-0.35 m). These biases are small and of the magnitude expected due to 
known sources of error (e.g., quantization due to rounding of measurement units, changes in tidal references 
and vertical datums; Calder, 2006). 



DEPTH 
STRATUM 


COMPARISON 
STATISTIC 


INDEPENDENT 

ACCURACY 

ASSESSMENT 

ERROR, 

GEOSTATISTICAL 

MODEL 


INDEPENDENT 

ACCURACY 

ASSESSMENT 

ERROR, 

COASTAL 

RELIEF MODEL 


Overall 
(30-200 m) 


Bias 


-0.23 m 


0.21 m 


MAE 


1.17m 


1.15m 


MAPE 


1.18% 


1.12% 


RMSE 


3.27 m 


3.39 m 


30-100 m 


Bias 


0.05 m 


0.55 m 


MAE 


0.90 m 


0.84 m 


MAPE 


1.20% 


1.10% 


RMSE 


1.27 m 


1.15m 


100-200 m 


Bias 


-0.68 m 


-0.35 m 


MAE 


1.60 m 


1.65 m 


MAPE 


1.14% 


1.15% 


RMSE 


5.06 m 


5.32 m 




Taking all comparative data together, the CRM may be the best model if the average value of depth is the 
primary variable of interest. However, when certainty in depth estimates needs to be accounted for, then the 
geostatistical model should be preferred, particularly when the depths of interest are shallower than 200 m. 

Examples of situations where estimates of bathymetric uncertainty may be useful include measuring the 
amount of habitat area falling into a given depth range, or identifying suitable construction zones for wind 
farms based on a depth limit (Figure 2.11). In the latter case, uncertainty can be used to define risk of additional 
development costs and be used to target the best areas to build within potential construction zones. 

2.5 LIMITATIONS TO INTERPRETATION AND FUTURE DIRECTIONS 

The geostatistical approach we have employed to create a gridded, interpolated bathymetry surface is an 
improvement over previous bathymetry models in that it generates a spatially explicit error map to accompany 
the predicted surface. The cross-validation and independent accuracy assessments show that the model 
performs similar to the NOAA Coastal Relief Model with the advantage of providing reliable uncertainty 
estimates. However, several limitations and potential improvements to our approach should be noted here to 
support interpretation of our models and development of future efforts. Noted limitations will also apply to the 
CRM and other modeling techniques. Principal limitations of our geostatistical models arise from three general 
factors: 

1 ) data quality: integrating diverse soundings collected over time and using different methodologies 
results in a variety of potential distortions in the final surface, 

2) resolution: the spatial resolution of original sample data and of the model output grid limit the 
minimum scale of features that can be resolved, and, 

3) model assumptions: geostatistical models involve a number of simplifying assumptions that do 
not fully capture the complexity of underlying geomorphological patterns. 

To help users better understand limitations and appropriate uses of this model, and to guide development 
of future models, we provide brief explanations and examples of these limitations below and suggest some 
potential improvements. 

2.5.1 Data Quality 

Hydrographic soundings in the study area came from a multitude of surveys spanning more than a century 
(1887-2004). Surveys used an assortment of positioning and sounding technologies, resulting in a patchwork 
of overlapping soundings collected at varying sample spacings and with different precisions. These errors can 
propagate to the final model creating distortions that do not correspond to changes on the seafloor. 

We have not dealt explicitly with horizontal positioning error. The impact of horizontal positioning error will 
show up in our models as an increase in the nugget effect over the actual instrument measurement error. 
Some studies have integrated estimates of positioning uncertainty explicitly into spatial models (Kielland and 
Tubman, 1994; Jakobsson et al., 2002). Kielland and Tubman (1994) used pseudo-points about the nominal 
location to combine ship position uncertainties with modeling uncertainties. Jakobsson et al. (2002) used 
a direct simulation Monte Carlo method in which an ensemble of possible data configurations were drawn 
assuming a distribution of positioning errors. These approaches could be used to improve the precision of our 
estimates of bathymetric uncertainty by accounting for differences in positioning certainty between older and 
newer data. 

We also did not explicitly account for differences and possible systematic biases in vertical accuracy of survey 
data. Archival NOS Hydrographic Survey data has been processed using varying methods over the years which 
have created some systematic post-processing errors (see Calder, 2006 for a full discussion). Briefly, Calder 
(2006) reported that archival lead line soundings (common prior to 1978) are systematically shallow-biased 
because of "hydrographic rounding" (a tendency to round down to the next shallowest whole fathom). Generally, 
the more recent VBES data appears approximately unbiased (but see below), and modern multibeam surveys 
offer the most precise information. Calder's findings suggest that some older VBES soundings are also biased 



because they were digitized from paper charts for which data were first rounded to the next shallowest foot 
or fathom (listed in the metadata as "smooth sheets digitized for NOS under contract"). It may be impossible 
to correct for biases in these data because the precise procedures followed were not recorded. More recent 
data entered directly into the database after collection by digital instruments are less likely to have systematic 
rounding error. 

For our purposes, we applied a correction factor to lead line surveys because these were found to have 

a predictable, systematic error in our study area (Calder, 2006). However, we were unable to correct for 

probable systematic biases in other sounding methods (e.g., VBES data that went through a smooth sheet 

digitization). Survey metadata indicates that 

approximately 18% of soundings were non-lead 

line data that were digitized using smooth sheets, 

and therefore would be improved by some bias 

correction. We attempted to reduce the effect of 

these potentially biased surveys by eliminating 

data from those surveys where they fell within 

the footprint of more modern surveys known to 

be unbiased (direct digitally ingested VBES and 

MBES). However, unfortunately, about 40% of 

the study area was only covered by archival lead 

line data and/or VBES data digitized from paper 

charts. These areas are less reliable and may 

contain systematic biases (typically shallow-biased 

by <2 m) that are not fully reflected in our model 

uncertainty estimates. In places where only older, 

less reliable data are available, we suggest using 

maps of survey age (Figure 2.2) and/or estimated 

survey measurement error based on the technique 

used for sounding (Figure 2.12) to supplement 

model-based uncertainty maps. These maps 

viewed alongside geostatistical model errors can 

help identify unreliable areas and areas where 

additional bathymetric information would improve 

future planning decisions. 
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Figure 2. 12. Mean estimated vertical measurement error for soundings 
within 1 km rectangular neighborhoods. Vertical measurement errors 
were estimated based on vertical sounding technology used. For 
some surveys this was inferred from survey age. The shelf edge 
corresponds to the 200 m depth contour. 
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We have purposefully neglected consideration of changes to the seafloor occurring over time. Temporal 
changes may or may not be reflected in spatially-explicit model error estimates depending on the ages of 
nearby surveys. We expect positional error attributed to change over time may be substantial in some areas, 
especially in highly dynamic areas, such as where tidal and riverine influences are great. 

Finally, we note that recently developed geostatistical algorithms could be used in the future to account for 
heterogeneous measurement error among methods (Christensen, 2011) to improve accuracy and more 
appropriately weight higher quality data. 

2.5.2 Resolution 

The distance between soundings was not uniform across the study area (Figure 2.2). The length scale 
of features that can be resolved will be shorter (higher resolution) in areas with greater sounding density. 
Moreover, it is possible that our model-based uncertainties will underestimate true error in areas with sparse 
soundings, especially when very high amplitude, high frequency features are present (e.g., high frequency, 
short-wavelength sand waves). The chances of this and other problems arising from interactions of sample 
spacing and high-frequency features (e.g., aliasing) are greater when samples are both sparse and very 
regularly spaced. Fortunately, in our cross-validation and independent accuracy assessment we did not find 
evidence for significant overall underestimation of uncertainty, but localized impacts are still possible in areas 
with high frequency features relative to sample spacing (e.g., sand waves). 



In general, fine-scale or very sharply defined features, such as erratics, deep sea reefs or man-made artifacts, 
will not be reliably resolved in our model. The spatial scale at which these features become visible is dependent 
on the relative distribution of soundings, the methods used to model spatial structure and the output resolution 
of maps. Although the sounding density would support resolution approaching 1 m in some limited areas (see 
Figure 2.2 where sounding density exceeds 100 km 1 ), the output model grid size was -85 m. The minimum 
scale of resolved length scales is twice the output resolution, or -170 m. For the vast majority of the study 
area data density was much sparser and could only detect features at scales on the order of 10 2 m and in 
some areas 1 3 m. In general, new MBES and/or sidescan sonar surveys are needed if greater detectability at 
short spatial scales is required. In some cases, modern VBES surveys acquired with co-registered sidescan 
information may be used to identify some missed features. Additionally, Calder (2006) presents a unique 
method of integrating a variance term corresponding to "hydrographic oversight" of smaller features, but the 
term requires a very good understanding of the data and geomorphology. 
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2.5.3. Model Assumptions 

A full discussion of the statistical assumptions inherent in the LOESS local regression and geostatistical 
approaches used here is beyond the scope of this chapter; the reader is referred to texts on the subjects (e.g., 
Cleveland and Devlin, 1988; Cressie, 1993; Chiles and Delfiner, 1999). However, it is important to note here 
that geostatistical models are not capable of reproducing the full complexity of geomorphological patterns 
(e.g., alluvial fans, sand waves) unless data are very dense. This is because geostatistical models describe 
spatial correlation as a simple function of distance between points, allowing only for very simple geometric 
anisotropy. Complex multi-point erosional and depositional patterns can't be resolved unless they are densely 
sampled. Texture-mapping approaches could possibly improve prediction of complex geomorphology (e.g., 
Boucher, 2009). Ultimately, however, collection of new multibeam bathymetry is preferable to any attempt to 
statistically reconstruct fine-level details in archival hydrographic surveys. 
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Image 3.1. Example of sand waves. 
Biogeography Branch. 
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3.1. SUMMARY 

Mapping seafloor features, including sediment 
characteristics and distribution, provides crucial 
information for a number of coastal and marine 
spatial planning applications. Seafloor maps can 
be used to help identify critical habitat areas for 
benthic organisms (e.g., clams, corals, demersal 
fish), select appropriate offshore construction 
sites, and plan sand/gravel mining operations. 

Predictive models of mean grain size, sediment 

composition, and hard bottom occurrence were 

developed for the New York study area (Figure 

1.2). These new models build upon the data 

compilations and analytical frameworks laid out 

by Goff et al. (2008), Poppe et al. (2005) and 

Greene et al. (2010), respectively. For mean 

grain size and sediment composition, the models 

provide continuous, gridded spatially-explicit prediction surfaces and corresponding uncertainty estimates. 

The hard bottom occurrence model also provides a continuous gridded prediction surface representing the 

likelihood of hard bottom occurrence. All information was mapped on the same 30 arc-second horizontal 

resolution grid used to characterize ocean habitat and seabird variables in Chapters 4 and 5 of this report. 

3.2. BACKGROUND 

The New York study area, like other broad continental shelf regions in the northeastern United States, is 
characterized by spatially variable seafloor features that have formed as a result of dynamic marine geological 
processes, particularly the dramatic (>100 m) rise in sea level following the last glaciation (Williams et al., 
2006; Goff etal., 2008). The present distribution of surficial sediments in the region reflects deposition, erosion, 
and other sedimentological processes during this period of sea level rise (Williams et al., 2006). 

The continental shelf within the study area has relatively simple topography and slopes gently from the shore 
to the shelf edge 100-150 km from shore (Allen etal., 1983). The seafloor on the continental shelf is composed 
mostly of sand which grades to silt and clay in deeper areas (Poppe etal., 1994). The relatively homogeneous 
seafloor is interrupted by sporadic relic sand and gravel ridges, exposed sandstone and bedrock, dumping 
sites, dredge disposal sites, and artificial reefs (e.g., shipwrecks, lost cargo, submerged pipelines). The most 
pronounced topographic features in the study area are the Hudson Shelf Valley, which crosses the entire shelf 
at the southern end of the study area (Butman et al., 2003), and many shelf edge incisions made by submarine 
canyons. The Hudson Canyon connects to the Hudson Shelf Valley and is the largest submarine canyon on 
the U.S. Atlantic continental margin (Butman et al., 2006). 

Mapping seafloor sediment characteristics is challenging, in part, because of the high variability of sediment 
characteristics at relatively short spatial scales. Characterization of physical features of the seafloor is often 
limited by the availability of comprehensive sampling across a wide range of spatial scales (Goff et al., 2008). 
Traditional bottom grab, core, trawl and camera surveys are limited in their spatial coverage. One solution to this 
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problem developed in recent decades is the use of acoustic backscatter information to characterize physical 
properties of seafloor sediments (e.g., Lathropetal., 2006; De Falcoetal., 2010; Harris and Stokesbury, 2010; 
Brown etal., 2011). 

Multibeam and sidescan sonar surveys completed in the New York study area (see Figure 2.1) recorded 
acoustic backscatter or reflectance data. If appropriately processed, this information can provide fine-scale 
sediment composition maps in areas of coverage. Acoustic backscatter data is not, however, uniformly available 
across the NY study area. Moreover, Goff et al. (2008) stressed geographic variability in the relationship 
between acoustic backscatter and sediment characteristics owing to differences in environmental factors like 
bathymetric slope and water column properties. They suggested that extensive direct sampling of surficial 
sediments is needed to assess the correlation between backscatter intensity and sediment character from 
region to region, which limits its utility to densely ground-truthed areas. 

An alternative method is to statistically model the spatial distribution of sediment using large databases 
assembled from many surveys. Goff et al. (2008) proposed and illustrated this method using the recently 
compiled U.S. Geological Survey (USGS) Atlantic Coast usSEABED database (Reid et al., 2005). The 
usSEABED database provides an extensive and heterogeneous collection of seabed survey data derived from 
a number of sources. The database includes both "extracted" information derived from analytical measurements 
and "parsed" information that is inferred from word-based descriptions (Reid et al., 2005). Goff et al. (2008) 
found that usSEABED parsed and extracted mean grain size data were suitable for use in sediment mapping 
even though data were collected using a range of methodologies across several decades, provided they were 
appropriately quality-controlled and bias-corrected. The predictive models described in this chapter are built 
on quality-controlled and bias-corrected usSEABED data. 

Two notable additional mapping efforts have produced sediment grain size distribution maps for the U.S. 
Atlantic coast. First, the USGS Continental Margin Mapping (CONMAP) Program developed a coarse-scale 
sediment grain size distribution map for the U.S. East Coast continental margin through the analysis and 
compilation of thousands of sediment samples, many of which are part of usSEABED. The CONMAP sediment 
data layer is a vector dataset with polygons classified according to the dominant surficial sediment type (Poppe 
et al., 2005). The metadata provided with the CONMAP sediment data layer indicates that this dataset does 
not capture localized features of sediment distribution and should be used mainly to describe regional trends in 
sediment grain size distribution (Poppe et al., 2005). It is therefore useful mainly as a qualitative mapping aid. 
Second, as part of the Northwest Atlantic Marine Ecoregional Assessment, The Nature Conservancy (TNC) 
produced a map of soft sediment characteristics using an interpolation of mean grain size point data from the 
usSEABED sediment database (Greene et al., 2010). A point dataset of hard bottom locations derived from 
the usSEABED dataset and National Marine Fisheries Service (NMFS) bottom trawl survey data was overlaid 
on the soft sediment map to identify hard bottom areas (Greene et al., 2010). The TNC maps were aimed at 
broad scale regional planning and did not provide a spatial assessment of map accuracy. 

The present study aims to build on previous mapping efforts by developing maps more appropriate for fine 
scale planning decisions in the NY study area, with spatially explicit accuracy maps. 

3.3. METHODS 

3.3.1. Study Region and Grid 

Predictions for mean grain size, sediment composition and the likelihood of hard bottom occurrence were 
made on a 30 arc-second spatial resolution geographic grid spanning the New York study area. The same 
grid was applied to predict oceanographic variables (Chapter 4) and seabird distributions (Chapter 6). The 30 
arc-second grid has a north-south linear dimension of 0.927 km and an average east-west linear dimension of 
0.814 km in the study area. For simplicity, decimal degrees were used to keep track of grid cell centroids and 
measure distances using a simple elliptical geodetic approximation; the effects of this simplifying assumption 
were negligible given the size of our study region and grid configuration (potential errors in linear distances < 
50% of grid cell horizontal resolution). 



3.3.2. Mean Grain Size 

Data Preparation 

We obtained the quality-controlled, bias-corrected, merged parsed and extracted database of mean sediment 
grain size (cp) described in Goff et al. (2008) from the lead author of that study (Dr. John Goff, University of 
Texas at Austin). Mean grain size is reported in cp units, where cp = -log 2 (mean grain diameter in mm) (Krumbein 
and Sloss 1963). In this scale, gravel corresponds to -6 to -1 cp, sand corresponds to -1 to 4 cp, and mud 
corresponds to 4 to 12 cp. 

Goff's dataset was derived from the publicly available usSEABED Atlantic Coast Offshore Surficial Sediment 
Data Release, version 1.0 (Reid et al., 2005). The original usSEABED extracted and parsed datasets were 
filtered to remove records that did not relate to surficial sediments. At locations with multiple records pertaining 
to surficial sediments, mean grain size was averaged. Since the laboratory-based analyses used to generate 
the extracted data may exclude hard components like shell and gravel and may therefore introduce a bias 
toward finer particles (Williams et al., 2006; Harris and Stokesbury, 2010), the parsed data were bias-corrected 
as described in Goff et al. (2008) prior to merging the extracted and parsed data. Goff's data covered only the 
mid-Atlantic portion of the U.S. Atlantic coast. We extracted a subset of the data including the NY study area 
for further analysis. 

Development of the Mean Grain Size Model 

A geostatistical modeling approach was used to predict a continuous, gridded surface for surficial sediment 
mean grain size from scattered sediment survey point data and to generate corresponding spatially-explicit 
uncertainty estimates. The same general modeling approach used for the bathymetry prediction and described 
in Section 2.3.1 was used for this analysis and is not reiterated here (see Cressie 1993 and Figure 2.3 for work 
flow). All geostatistical modeling steps were performed in ArcGIS 1 using the Spatial Analyst and Geostatistical 
Analyst toolboxes (ESRI 2011a). 

The deterministic mean trend was estimated using local polynomial interpolation (LPI), a semi-parametric local 
regression technique that creates a prediction surface by fitting polynomial functions of a specified degree to 
data in overlapping search neighborhoods defined by a constant search radius, or bandwidth (ESRI 2011a). 
LPI uses weighted least-squares regression, with weights equal to outside the search neighborhood, and a 
Gaussian function of distance inside the search neighborhood. LPI was chosen over other techniques because 
it provides approximate parametric confidence intervals, and because bandwidth can be adjusted to ensure 
that only broad-scale trends are captured, leaving more localized information in residuals. LPI outputs are a 
prediction surface, an approximate parametric prediction standard error surface, indicating the uncertainty 
associated with the prediction at each location, and a spatial condition number surface, indicating the stability 
of the local regression model at each location. Higher spatial condition number values indicate that the solution 
is less stable, such that small variations in the input data (e.g., uncertainty due to measurement error) can 
result in large variations in the prediction. For second order polynomials, the critical spatial condition number 
threshold value is 1 00 (Golub and Van Loan, 1 996), meaning that predictions should be considered with caution 
at locations where the spatial condition number is close to 100 and should be considered unreliable where it is 
greater than 1 00. We used second degree (quadratic) polynomials, an eight sector circular search neighborhood 
with 1 decimal degree bandwidth (-111 km), and Gaussian kernel weights. At least 10 and no more than 250 
data from each sector were used to produce each trend prediction. The eight-sector neighborhood search was 
used to mitigate the effects of uneven sample distribution on the trend surface estimation. 

Residuals were obtained by subtracting the trend surface prediction at each data location from the observed 
data value. Residuals were checked for normality by examining histogram and normal QQ plots. A sample 
semivariogram of the residuals was then calculated in ArcGIS 10 using the Geostatistical Analyst extension 
(ESRI, 2011a). Lag size was selected based on examining the distribution of nearest-neighbor distances, 
choosing the smallest lag size that would allow sufficient samples to estimate semivariance values near the 
origin. Anisotropy (i.e., changes in spatial autocorrelation due to direction) was checked using directional 
semivariograms. To model spatial autocorrelation in residuals, a constrained weighted least squares algorithm 
was used to fit an exponential anisotropic model to the sample semivariogram (model parameters in Table 3.1 ). 



Fitted semivariogram model parameters were used to perform ordinary kriging (OK) of the residuals in ArcGIS 
10 using eight-sector search neighborhoods and anisotropic search neighborhood radii equal to the radii of 
the anisotropic semivariogram model. At least 5 and no more than 25 data points from each sector were used 
to produce kriging predictions on a 30 arc-second grid. Kriging predictions were generated at the centroid 
of each grid cell. We confirmed that centroids did not intersect with data point locations, so that the nugget 
effect (measurement error and small-scale variance) was "filtered out" of the kriging prediction. This resulted 
in a desirable noise reduction in the prediction surface, equivalent to the maximum a posteriori resampling 
algorithm used by Goff et al. (2006, 2008). 

Table 3. 1. Semivariogram parameters for the mean grain size model. 



DATA 


n 


TYPE OF 
VARIO- 
GRAM 
MODEL 


NO. 

OF 

LAGS 


LAG 
SIZE 

(km) b 


NUGGET 


MAJOR 

RANGE 

(km) b 


MINOR 

RANGE 

(km) b 


DIREC- 
TION 

(T 


PARTIAL 
SILL 


% OF THE 
SILL DUE TO 
THE NUGGET 


Mean 
Grain 
Size 8 


14,612 


Exp 


50 


1.11 


1.26 


24.44 


12.22 


288.05 


0.98 


56.25 



a - in cp units, where cp = -log 2 (mean grain diameter in mm); b - converted from decimal degrees to kilometers using 111.1 km/decimal degree; 
c - clockwise from North; Exp = Exponential 

The model surface representing the predicted mean grain size was calculated as the sum of the trend (LPI) and 
residual (kriging) prediction surfaces. The corresponding prediction standard error surface was calculated as 
the square root of the sum of the trend and kriging prediction variances (errors in the trend and residual surfaces 
are assumed to be independent). The final prediction and prediction standard error surfaces were exported as 
ESRI grids with the extent and spatial resolution described in Section 3.3.1 . An error mask was applied to the 
output grids to exclude areas where the kriging standard error was greater than 97.5% of the residual sample 
standard deviation. This error mask was applied to all surficial sediment outputs for consistency. 

In addition to the prediction and prediction standard error maps, a vector dataset with polygons classified 
by mean grain size classes was generated by assigning each grid cell a mean grain size class using the 
classification scheme of Wentworth (1 922). Finally, the probability of mean grain size exceeding the thresholds 
25.6 cm (cp < -8, boulders and larger), 6.4 cm (cp < -6, cobbles and larger), 2 mm (cp < -1, pebbles and larger), 
and 0.062 mm (cp < 4, very fine sand and larger) were mapped, by integrating under the normal distribution 
defined for each grid cell by the OK prediction mean and variance. 

3.3.3. Sediment Composition 

Data Preparation 

Seabed survey data from the usSEABED Atlantic Coast Offshore Surficial Sediment Data Release, version 1 .0 
(Reid et al., 2005) parsed and extracted databases were used to develop models of the fractional composition 
of mud, sand, and gravel in surficial sediments in the study area. Data were downloaded from the USGS 
publications website (http://pubs.usgs.gov/ds/2005/118/htmldocs/usseabed.htm). The dataset provided by Dr. 
Goff for the mean grain size analysis was not used for sediment composition modeling because it did not 
include percentages of mud, sand, and gravel. The survey point data was filtered to remove duplicate points 
and points not relating to surficial sediments. Survey records were removed if the "sample phase" attribute for 
the database record indicated that the sample was clearly not from surficial sediments (e.g., from the bottom 
of the sample core). When multiple records referred to the same sample core, the record that described the 
top of the sample core (as indicated by the "sample top" attribute) was retained and the other records were 
removed. When multiple records pertaining to surficial sediments existed at a location, values for percent mud, 
percent sand, and percent gravel were averaged. When the sum of all fractions exceeded 100%, each of the 
mud, sand, and gravel percentages was divided by the sum to re-normalize sums to 1 00% (sums were seldom 
greater than 110% so any error introduced by this re-normalization procedure would be small). Separate 
datasets for each sediment type were extracted, excluding records with "no data" for the given sediment type 
(but including 0% values). Percentages were converted to fractional values between and 1 for subsequent 
processing. 



Development of the Sediment Composition Models 

The same geostatistical approach used to model mean grain size (see Section 3.3.2) was applied separately 
to each of the individual sediment type datasets. The deterministic mean trend was fit using the same LPI 
parameters as described for the mean grain size model in Section 3.3.2. Specific exponential anisotropic 
models for each sediment type were fitted to the sample semivariogram as described in Section 3.3.2 (model 
parameters are shown in Table 3.2). Fitted semivariogram model parameters were used to perform ordinary 
kriging (OK) following the same procedures outlined in Section 3.3.2. 

Table 3.2. Semivariogram parameters for the sediment composition models. 



DATA 


n 


TYPE OF 
VARIO- 
GRAM 
MODEL 


NO. 

OF 

LAGS 


LAG 
SIZE 
(km) a 


NUGGET 


MAJOR 

RANGE 

(km) a 


MINOR 

RANGE 

(km) a 


DIREC- 
TION 

(T 


PARTIAL 
SILL 


% OF THE 
SILL DUE TO 
THE NUGGET 


Mud 
Fraction 


30,126 


Exp 


50 


1.11 


0.064 


24.44 


12.22 


82.44 


0.15 


29.91 


Sand 
Fraction 


30,127 


Exp 


50 


1.11 


0.082 


24.44 


12.22 


77.87 


0.16 


33.88 


Gravel 
Fraction 


30,115 


Exp 


50 


1.11 


0.04 


24.44 


12.22 


141.5 


0.045 


47.06 



a - converted from decimal degrees to kilometers using 111.1 km/decimal degree; b - clockwise from North; Exp = Exponential 



Maps representing the predicted fraction for each sediment type were calculated as the sum of the trend 
(LPI) and residual (kriging) prediction surfaces. The corresponding prediction standard error surfaces were 
calculated as the square root of the sum of the trend and kriging prediction standard error surfaces (errors in 
the trend and residual surfaces are assumed to be independent). The final prediction and prediction standard 
error surfaces were exported as ESRI grids with the extent and spatial resolution described in Section 3.3.1. 
An error mask was applied to the output grids as described in Section 3.3.2. 

Following Goovaerts (1997) and Deutsch and Journel (1998) each sediment type prediction surface was 
corrected for order violations by setting values less than zero to zero and values greater than one to one and 
by dividing each prediction value by the sum of the three prediction surfaces where their sum exceeded one. 
Where the sum was less than one we did not divide by the sum of the prediction surfaces, as some sediment 
could have been neither mud, sand, nor gravel (e.g., clay). 

In addition to the prediction and prediction standard error maps for each sediment type, a vector dataset with 
polygons classified by sediment texture classes was generated by assigning each grid cell a sediment texture 
class using the Folk classification scheme (Folk 1954, 1974) based on the predicted ratios of sediment types. 

3.3.4. Hard Bottom Occurrence 

Data Preparation 

An integrated point dataset of known hard bottom locations was built from three sources of seabed survey 
data and used to develop a prediction surface for the likelihood of hard bottom occurrence. First, we identified 
locations in the usSEABED Atlantic Coast Offshore Surficial Sediment Data Release, version 1.0 (Reid et 
al., 2005) parsed and extracted databases where the Shepard code for the point was "solid" or the rock 
membership value was greater than zero (for description of the rock membership value see Reid et al., 
2005). Second, we searched the "National Oceanic and Atmospheric Administration (NOAA)/National Ocean 
Service (NOS) and U.S. Coast and Geodetic Survey (USCGS) Bottom Type Descriptions from Hydrographic 
Surveys" database archived at the NOAA National Geophysical Data Center (NOAA NGDC 2011) for point 
locations where hydrographic survey annotations had described the bottom type as hard or rocky. Third, The 
Nature Conservancy (TNC) provided a hard bottom point dataset compiled from information in the usSEABED 
database (Reid et al., 2005) and the National Marine Fisheries Service (NMFS) bottom trawl dataset as part 
of the Northwest Atlantic Marine Ecoregional Assessment (Greene et al., 2010; J. Greene, pers. comm.; 



M. Fogarty, pers. comm.). We merged points from these three data sources and removed surveys with identical 
geographic coordinates. Sample distribution bias can have strong effects on presence-only models (Phillips 
et al., 2009; Elith et al., 2011). To create a dataset with more uniformly distributed sample effort, we removed 
hard bottom points in densely surveyed nearshore areas. 

Development of the Hard Bottom Occurrence Model 

In contrast to other predictive models developed in this report, a geostatistical model could not be applied to the 
hard bottom point data, because the available hard bottom datasets were restricted to presences, rather than 
absences, of hard bottom. The lack of absence data arises because hard bottom is very patchily distributed 
even at very small scales (centimeters to meters). A point sample such as a sediment core that brings up soft 
sediment does not preclude the presence of hard bottom in the immediate vicinity. Any geostatistical model 
developed using the unreliable absence data would be heavily biased and uninformative. Reliable absence 
data for hard bottom generally requires diver, remotely operated vehicle (ROV), photo, video, or acoustic 
backscatter data that continuously covers large swaths of area; generally impractical in deeper waters. 

For this reason, a maximum entropy (MaxEnt) model was used to predict the likelihood of hard bottom 
occurrence by combining the presence-only hard bottom point dataset with potential predictor variables 
(Phillips et al., 2006; Phillips and Dudik, 2008). This approach can be thought of as creating a "suitability map" 
for the presence of hard bottom patches, analogous to habitat suitability maps developed for organisms (Elith 
et al., 2011). A full description of the MaxEnt algorithm is beyond the scope of this document (see Elith et al., 
2011). Briefly, MaxEnt produces an estimate of the relative likelihood of a feature's occurrence at each location 
in a specified grid, assuming that presences take on the most spatially random (uniform) distribution possible 
under the constraint that for each environmental predictor variable the expected value from the estimated 
distribution matches its observed mean (Elith et al., 2006; Phillips et al., 2006; Peterson et al., 2007). MaxEnt 
models are trained on a subset of the data and validated by testing predictions on remaining data. MaxEnt 
has been shown to perform well compared to other presence-only approaches (Elith et al., 2006; Phillips and 
Dudik, 2008), and is readily implemented using free, open-source software (Phillips et al., 2006, downloadable 
at http://www.cs.princeton.edu/~schapire/maxent/). 

Environmental predictor variables used to train the model of hard bottom occurrence included mean grain 
size, depth, slope, slope of slope, bathymetric variance, distance from shore, signed distance from shelf, 
sea surface chlorophyll concentration, and turbidity. All of these predictors are described in Chapters 4 and 
6, with the exception of bathymetric variance (calculated as the standard deviation of the depth in -900 m 
rectangular neighborhoods). The same transformations described in Appendix 6.B were applied to mean grain 
size, depth, slope, slope of slope, distance from shore, signed distance from shelf, chlorophyll, and turbidity. 
Although transformation is not strictly necessary for MaxEnt, we found that transforming predictors improved 
cross-validation model performance. Eighty percent of the hard bottom presence points were used to build 
(train) the model and 20% of the points were randomly withheld to test the model. MaxEnt is more robust than 
regression techniques to the inclusion of large sets of potential predictor variables (Elith et al., 2011), so no 
model selection was carried out to reduce the size of this predictor set. 

MaxEnt provides three post-hoc assessments of the relative importance of predictor variables. First, the MaxEnt 
program provides a summary of how much each predictor contributes to the gain of the model, accumulated 
for each predictor over the course of the training algorithm. Second, the MaxEnt program randomly permutes 
the values for each predictor (one at a time) and determines the resulting decrease in the area under the 
training model receiver operating characteristic curve (AUC). This provides a measure of how strongly the 
model depends on each predictor. Third, the MaxEnt program estimates predictor importance using a jackknife 
approach, in which it re-runs the model for each predictor, first building the model with all variables except 
the predictor of interest, and then building the model with only the predictor of interest. If a predictor is highly 
correlated with the other predictors, withholding it will have little impact on model performance. Therefore, an 
important and non-redundant predictor will have high explanatory power by itself and its omission from the 
model will result in a significant reduction in predictive power. 



The final map of the "hard bottom occurrence likelihood index" consisted of a logistic transformation of MaxEnt's 
raw output to produce a smooth index between and 1 (this is the default output of the MaxEnt program). 
It is related to the probability of occurrence, but is not strictly a probability (Elith et al., 2011). It should be 
considered an index of the relative likelihood of hard bottom occurrence, rather than a strict measure of the 
probability of encountering hard bottom. All of the issues and caveats related to interpretation of ecological 
models based on presence-only data are applicable (Elith et al., 2011). 

3.3.5. Model Validation 

The performance of mean grain size and sediment composition models was evaluated by two methods: leave- 
one-out cross-validation and qualitative comparison to an independent sidescan sonar backscatter dataset. 
Leave-one-out cross validation of kriging predictions was performed in ESRI Geostatistical Analyst (ESRI 
2011a) as described in Goovaerts (1997). Cross-validation statistics were calculated as described in ESRI 
(201 1b). Qualitative comparisons to acoustic backscatter data followed Goff et al. (2008). Model prediction 
maps were presented alongside existing 100-120 kHz backscatter data collected by USGS in the New York 
Bight region (Schwab et al., 2000; Schwab et al., 2002) and visually interpreted. In general, acoustic backscatter 
intensity is lower where there are fine sediments and higher where there are coarse sediments (Ferrini and 
Flood, 2006; DeFalco et al., 2010). 

In addition to qualitative comparison with acoustic backscatter data, the hard bottom occurrence model 
performance was evaluated using cross-validation on the 20% of data withheld from training. Cross- 
validation performance was evaluated using the Area-Under-Curve (AUC) statistic of the receiver operating 
characteristic (ROC)-like MaxEnt output (Fielding and Bell, 1 997). Traditional ROC curves plot the true positive 
rate (sensitivity) versus the false positive rate (1 - specificity) for the range of potential threshold values, with 
the AUC statistic providing a measure of how well the model maximizes the true positive rate for low values of 
the false positive rate. The AUC statistic ranges from 0.5 (no better than random) to 1 .0 (a perfect prediction). 
An AUC statistic greater than 0.75 is generally indicative of a potentially useful model, and ROC curves for 
high performing models will approach the upper left corner of the plot (Fielding and Bell, 1997; Peterson et al., 
2007). The ROC-like analysis used by MaxEnt differs in that it substitutes the fractional predicted area for the 
specificity since there is no absence data from which to measure specificity (the true negative rate). As a result 
the maximum achievable AUC is less than one (Phillips et al., 2006). 

3.4. RESULTS AND DISCUSSION 
3.4.1. Mean Grain Size 

Model Predictions 

The model of mean grain size extended just past the continental shelf edge and provided predictions of mean 
grain size for the majority of the study area. The model did not extend into Long Island Sound (LIS) because 
Goff et al. (2008) did not include LIS in their quality-controlled dataset. An additional reason for excluding LIS 
was that the geostatistical model developed for the open ocean system (most of the study area) would not 
have applied to the geomorphologically distinct, enclosed Long Island Sound system. The model did not extend 
far past the continental shelf edge due to lack of sufficient sampling effort. The model predicted that much of 
the seafloor is covered by sediment with a mean grain size characteristic of coarse to medium sand (0-2 cp), 
with areas having mean grain size characteristic of finer sand and silt (4-6 cp) within the upper reaches of the 
Hudson Shelf Valley, offshore of the shelf break, and in the area of the Block Island Delta and Block Island 
Valley. The model predicted that mean grain sizes characteristic of coarse to medium sand covered -60% of 
the study area, mean grain sizes characteristic of fine to very fine sand covered -32% of the study area and 
mean grain sizes characteristics of silt covered -8% of the study area (Figure 3.1 , Figure 3.2, Table 3.3). 

Model Uncertainty 

Model standard error ranged from -1.2 cp units for grid cells with a very high density of seafloor surveys 
to -1.6 cp units for grid cells farthest from survey locations, such as the area farthest offshore (Figure 3.3). 
Model uncertainty of this magnitude corresponds to theoretical 95% confidence intervals (±1.96*standard 
error) ranging from the mean grain size prediction ± 2.4 cp units in densely surveyed areas to the prediction ± 
3.2 cp units where surveys were more sparse. For example, a mean grain size prediction of 1 cp in a densely 
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Figure 3. 1. Predicted mean grain size ofsurficial sediments from kriging interpolation of mean grain size data in the Mid-Atlantic Bight. 
Mean grain size is in cp units, where cp = -logjmean grain diameter in mm). Data courtesy of J. Goff (University of Texas at Austin), 
derived from USGS usSEABED database (Reid et al., 2005). 

surveyed area could have a theoretical 95% confidence interval of (-1 .4 cp, 3.4 cp). In this case, the mean grain 
size prediction corresponds to coarse sand, but the confidence interval limits range from very fine pebbles to 
very fine sand. Given that sediment surveys in areas offshore of the continental shelf break were quite sparse 
and prediction accuracy in these areas is worse than in nearshore areas, model predictions offshore of the 
shelf break should be used with caution. 

The predicted probabilities of exceeding mean grain size thresholds of 25.6 cm (cp < -8, boulders and larger), 
6.4 cm (cp < -6, cobbles and larger), 2 mm (cp < -1, pebbles and larger), and 0.062 mm (cp < 4, very fine sand 
and larger) generally followed spatial patterns in mean grain size predictions. This is expected since these 
probability calculations assume a normal distribution of sediment grain sizes around the mean. The probabilities 
of having a mean grain size less than -8 cp (boulders and larger) or less than -6 cp (cobbles and larger) were 
essentially zero across the entire study area (Figure 3.4a, b). This does not mean that boulders and cobbles 
do not occur, only that they are almost never the mean grain size over any appreciable area (they are always 
mixed with other, finer sediment types), and/or they are very erratic in their occurrence (occurring only as 
isolated departures from the mean). The probability of having a mean grain size less than -1 cp (pebbles and 
larger) was very low across most of the study area, but there were areas of higher probability corresponding 




Figure 3.2. Distribution of predicted surficial sediment mean grain size classes, derived from kriging interpolation of mean grain size 
data in the Mid-Atlantic Bight. Mean grain size classes are defined based on Wentworth (1922). Data courtesy of J. Goff (University of 
Texas at Austin), derived from USGS usSEABED database (Reid et al., 2005). 

Table 3.3. Total area and percent area of predicted mean grain size classes in the study area. 



MEAN GRAIN SIZE 
CLASS 3 


MEAN GRAIN 
SIZE (<p) b 


MEAN GRAIN SIZE (METRIC) 


PREDICTED TOTAL 
AREA (km 2 ) 


PREDICTED 
PERCENT AREA 


Pebbles 


-6 --2 


4 mm - 6.4 cm 


1 





Granules 


-2--1 


2 mm -4 mm 


12 





Very Coarse Sand 


-1-0 


1 mm -2 mm 


38 


0.1 


Coarse Sand 


0-1 


0.5 mm - 1 mm 


4,978 


12.2 


Medium Sand 


1 -2 


0.25 mm -0.5 mm 


19,613 


47.9 


Fine Sand 


2-3 


0.125 mm -0.25 mm 


8,607 


21.0 


Very Fine Sand 


3-4 


0.062 mm -0.125 mm 


4,402 


10.8 


Coarse Silt 


4-5 


0.031 mm -0.062 mm 


2,525 


6.2 


Medium Silt 


5-6 


0.016 mm -0.031 mm 


734 


1.8 


Fine Silt 


6-7 


0.008 mm -0.016 mm 








Very Fine Silt 


7-8 


0.004 mm -0.008 mm 








Clay 


8-10 


0.001 mm -0.004 mm 








a - from Wentworth (1922) 

b - cp = -log 2 (mean grain diameter in mm) 
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F/gi/re 3.3. Surficial sediment mean grain size prediction standard error (in cp units, cp = -logjmean grain diameter in mm]) from kriging 
interpolation of the mean grain size data in the Mid-Atlantic Bight. Data courtesy of J. Goff (University of Texas at Austin), derived from 
USGS usSEABED database (Reid et al., 2005). 

well with the areas mapped as pebbles by the mean grain size prediction model (Figure 3.1 , Figure 3.4c). The 
probability of having a mean grain size less than 4 cp (very fine sand and larger) was high across the study 
area, with areas of near zero probability corresponding to areas that were mapped as silt (Figure 3.1, Figure 
3.4d). 

Probabilities for each mean grain size threshold were also mapped with values standardized by the maximum 
probability values (Figure 3.5). These maps emphasize areas where the mean grain size has the highest 
relative likelihood of exceeding the indicated thresholds. Panels (a) and (b) of Figure 3.5 should be interpreted 
with some caution since the highest probabilities are near zero for thresholds of less than -8 cp (boulders and 
larger) or less than -6 cp (cobbles and larger). 

Model Validation 

Leave-one-out cross-validation of the mean grain size prediction model yielded a root-mean-square error 
(RMSE) of 1.4 cp, which was reasonable given the magnitude of grain size measurement error (estimated 
to be on the order of 1 cp unit; J. Goff, University of Texas at Austin, personal communication, February 11, 



2011) and allowing for unresolved small- 
scale variance and model specification 
error. Cross-validation also indicated that 
the prediction errors were unbiased (mean 
standardized prediction error was near 
zero) and the assessment of prediction 
uncertainty was valid since the root-mean- 
square standardized error was close to one 
(Table 3.4). 

Although the mean grain size prediction 
model was mapped at a considerably lower 
spatial resolution (30 arc-second grid cells 
have an average linear dimension of -800- 
900 meters in the study area) than the 
USGS acoustic backscatter data (4 m grid 
cells) from Schwab et al. (2002), comparison 
of the mean grain size prediction map to 
the USGS backscatter data provided a 
qualitative assessment of the accuracy of 
the predictions for overlapping areas. In 
general, areas of high backscatter intensity 
(lighter shades) were associated with areas 
predicted to have coarser sediments, such 
as the areas labeled A and C in Figure 3.6. 
Also, the Hudson Shelf Valley (area labeled 
B in Figure 3.6) had low backscatter intensity 
and was predicted to have finer sediments. 
These comparisons were consistent with 
the conclusions of Ferrini and Flood (2006) 
and De Falco et al. (2010), who found that 
acoustic backscatter intensity is generally 
lower where there are fine sediments and 
higher where there are coarse sediments. 
However, the matchup is clearly not 
perfect, and without detailed calibration of 
backscatter to ground-truth samples it is 
difficult to say whether deviations between 
the two maps are due to inaccurate model 
predictions or variation in the backscatter 
surface not associated with sediment 
variation. 

3.4.2. Sediment Composition 

Model Predictions 

Models of sediment composition provided 
predictions for most of the study area, from 
the southern shore of Long Island, where 
survey density was greatest, to just past the 
continental shelf edge (Figure 3.7). 

The mud fraction model predicted that 
seafloor surficial sediments were composed 
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Figure 3.4 (top). Maps of the probability that surficial sediment mean grain 
size (in cp units, cp = -logjmean grain diameter in mm]) is less than threshold 
values of (a) -8 cp (boulders and larger), (b) -6 cp (cobbles and larger), (c) -1 cp 
(pebbles and larger), and (d) 4 cp (sand and larger). Figure 3.5 (bottom). Same 
as Figure 3.4, with probability values standardized by maximum probability 
of each corresponding threshold map. Values were adjusted to emphasize 
areas with the highest probabilities of exceeding each of the mean grain size 
thresholds. Data courtesy of J. Goff (University of Texas at Austin), derived 
from USGS usSEABED database (Reid et al., 2005). 
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Figure 3.6. Comparison of overlapping portions of the (a) USGS acoustic backscatter data (from Schwab et al., 2002) and (b) surficial 
sediment mean grain size prediction (in q> units, cp - -logjmean grain diameter in mm]). In general, higher backscatter intensity indicates 
coarser sediments (e.g., coarse sand, pebbles) while lower backscatter intensity indicates finer sediments (e.g., sand and silt). Mean 
grain size data courtesy of J. Goff (University of Texas at Austin), derived from USGS usSEABED database (Reid et al., 2005). 
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Figure 3. 7. Locations of usSEABED sediment survey data in the study area. Data courtesy of USGS usSEABED database (Reid et 
al., 2005). 



Table 3.4. Cross-validation statistics for the mean grain 
size model. 



mostly of mud in several areas, including in the Hudson 

Shelf Valley, in and around the Hudson Canyon, along 

the continental shelf slope, and over a large swath south 

of Martha's Vineyard and Rhode Island between the 50 m 

and 100 m contours (Figure 3.8a). The sand fraction model 

predicted that surficial sediments were composed mostly 

of sand throughout most of the study area (Figure 3.8b), 

with the exception being those areas predicted to have 

mud-dominated sediments by the mud fraction model. The 

gravel fraction model predicted that surficial sediments were 

composed mostly of gravel in only a few small areas (Figure 3.8c). These areas corresponded to those mapped 

as pebbles or granules by the model of mean grain size (Figure 3.1). Under Folk's classification scheme (Folk 

1954, 1974), just over half of the study area was mapped as gravelly sand or slightly gravelly sand (Figure 

3.8d, Table 3.5). Another -30% of the study area was mapped as having a mix of mud and sand with slight 

amounts of gravel. An almost negligible area was mapped as having predominantly gravel. 



DIAGNOSTIC STATISTIC 


VALUE 


Bias 


-0.0007 


Root-Mean-Square Error (RMSE) 


1 .4090 


Mean Standardized Prediction Error 


-0.0004 


Root-Mean-Square Standardized Error 


1 .0920 
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Figure 3.8. Surficial sediment composition maps from kriging interpolations ofusSEABED sediment composition data in the Mid-Atlantic 
Bight, (a) Predicted mud percentage, (b) predicted sand percentage, (c) predicted gravel percentage, (d) distribution of predicted 
sediment texture classes. Sediment texture classes were assigned based on the ratios of the predicted mud, sand, and gravel fractions 
according to Folk (1954, 1974). Data courtesy ofUSGS usSEABED database (Reid et ai, 2005). 



Model Uncertainties 

For all three sediment types, model 
uncertainties were lowest in densely 
surveyed areas and highest in unsurveyed 
areas, particularly farthest offshore of the 
continental shelf break. Model prediction 
standard error was lowest for the gravel 
fraction model (Figure 3.9), which was 
the least encountered sediment type in 
sediment samples. As with the mean grain 
size model, given that sediment surveys 
in areas offshore of the continental shelf 
break were sparse and prediction accuracy 
in these areas is worse than in nearshore 
areas, model predictions offshore of the 
shelf break should be used with caution. 

Model Validation 

Cross-validation of the sediment composition 
models yielded RMSE values ranging from 
23% for gravel to 32% for sand (Table 3.6). 
The somewhat high RMSE values were 
not surprising given the heterogeneous 
nature of the data (laboratory analyzed vs. 
interpretation of written descriptors) and 
the potential bias toward finer particles for 
some of the sediment surveys (discussed in 
Goff et al., 2008). Cross-validation indicated 
that prediction errors were unbiased (mean 
standardized prediction errors were near 
zero) and the assessments of prediction 
uncertainty were valid since the root-mean- 
square standardized errors were close to one 
(Table 3.6). Given the relatively high RMSE 
values, all three sediment composition 
models should be used with caution and 
with the knowledge that available data can 
only provide moderately reliable predictions 
at investigated spatial scales. Although there 
are discernible broad-scale spatial patterns, 
the composition of sediment in any given 
point sample is highly variable and difficult 
to predict. 



Table 3.5. Total area and percent area of predicted sediment texture classes 
in the study area. 



SEDIMENT 

TEXTURE 

CLASS 3 


DESCRIPTION 3 


PREDICTED 

TOTAL 
AREA (km 2 ) 


PREDICTED 

PERCENT 

AREA 


mud 


"gravel <0.01%, 
sand : mud < 1:9" 


61 


0.1 


slightly gravelly 
mud 


"0.01-5% gravel, 
sand : mud < 1:9" 


408 


1.0 


gravelly mud 


"5-30% gravel, 
sand : mud < 1:1" 


788 


1.9 


slightly gravelly 
sandy mud 


"0.01-5% gravel, 

sand : mud from 

1:9 to 1:1" 


5158 


12.6 


sandy mud 


"gravel<0.01%, 

sand : mud from 

1:9 to 1:1" 


918 


2.2 


sand 


"gravel < 0.01%, 
sand : mud > 9:1" 


774 


1.9 


slightly gravelly 
sand 


"0.01-5% gravel, 
sand : mud > 9:1" 


9,352 


22.9 


gravelly sand 


"5-30% gravel, 
sand : mud >9:1" 


12,353 


30.2 


slightly gravelly 
muddy sand 


"0.01-5% gravel, 

sand : mud from 

1:1 to 9:1" 


6,812 


16.7 


gravelly muddy 
sand 


"5-30% gravel, 

sand : mud from 

1:1 to 9:1" 


3,488 


8.5 


muddy sand 


"gravel < 0.01%, 

sand : mud from 

1:1 to 9:1" 


472 


1.2 


gravel 


> 80% gravel 








muddy gravel 


"30-80% gravel, 
sand : mud < 1:1" 


11 





muddy sandy 
gravel 


"30-80% gravel, 

sand : mud from 

1:1 to 9:1" 


87 


0.2 


sandy gravel 


"30-80% gravel, 
sand : mud > 9:1" 


211 


0.5 



a -from Folk 1954, 1974. 



Table 3. 6. Cross-validation statistics for the sediment composition models. 



The gravel fraction prediction model 
was compared to USGS acoustic 
backscatter data from Schwab et 
al. (2002) to provide a qualitative 
accuracy assessment of predictions 
for overlapping areas. Although the 
gravel fraction model was mapped 
at a considerably coarser spatial 
resolution (-800-900 m grid cells) 
than the backscatter data (4 m grid 





SEDIMENT COMPOSITION MODEL 


Diagnostic Statistic 


Mud Fraction 


Sand Fraction 


Gravel Fraction 


Bias 


-0.000004 


-0.000420 


0.000140 


Root-Mean-Square 
Error (RMSE) 


0.266200 


0.320800 


0.227200 


Mean Standardized 
Prediction Error 


-0.000076 


-0.000620 


0.000340 


Root-Mean-Square 
Standardized Error 


0.862400 


0.943400 


1 .005900 




cells), a general spatial correspondence was observed over 
broad spatial scales. Areas of high backscatter intensity (lighter 
shades) were generally associated with areas predicted to be 
gravelly (A and C in Figure 3.10) and the narrow region of low 
backscatter intensity in the Hudson Shelf Valley (B in Figure 
3.10) roughly corresponded to areas predicted to be sand or 
mud. As noted previously, the matchup is clearly not perfect, 
and without detailed calibration of backscatter to ground-truth 
samples it is difficult to say whether deviations between the two 
maps are due to inaccurate model predictions or variation in 
the backscatter surface not associated with sediment variation. 

3.4.3. Hard Bottom Occurrence 

Model Prediction 

The MaxEnt model output indicated a relatively high likelihood of 
hard bottom occurrence in nearshore areas and in the vicinity of 
canyon features just offshore of the continental shelf break (e.g. , 
Hudson Canyon) (Figure 3.11). The model also corresponded 
well to the mean grain size model in that it predicted a low 
likelihood of hard bottom occurrence in areas mapped as fine 
particles. It is important to note that the model does not provide 
any indication of the size of predicted hard bottom features, 
and does not necessarily relate to the proportion of substrate 
that is hard bottom at a given location. Rather, the model 
provides a relative index of the likelihood that at least one hard 
bottom point would occur if an area was sampled a sufficient 
number of times. An area predicted as having a high likelihood 
of hard bottom occurrence may in fact be dominated by non- 
hard bottom substrate. For example, the nearshore areas in 
the study area were predicted to have a high likelihood of hard 
bottom occurrence by the MaxEnt model, but the mean grain 
size and sand fraction models suggest they are predominantly 
sandy. Taken together, the models suggest that the nearshore 
areas have a surface composed primarily of sandy sediments 
but with widely distributed (although not abundant) hard 
bottom components such as large boulders, bedrock, or highly 
consolidated sediments. This example stresses the importance 
of supplementing information derived from one aspect of 
sediment character (e.g., mean grain size) with additional 
information to provide a more complete characterization of 
surficial sediment distribution. 

The MaxEnt model output indicated that the predictor variables 
distance from shore, slope of slope, depth, and signed 
distance from shelf were most important in determining the 
distribution of hard bottom presences, relative to the other 
predictors included in the set, when measured in terms of their 
contribution to regularized model gain (Table 3.7). According to 
the jack-knifing outputs, signed distance from shelf, distance 
from shore, and depth had the greatest individual predictive 
power (Figure 3.12). The model built without signed distance to 
shelf showed a significant decrease in gain, which suggested 
that this predictor was important and not redundant (i.e., not 
highly correlated with other predictors). Interestingly, surface 
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Figure 3.9. Standard error maps for (a) predicted mud 
percentage, (b) predicted sand percentage, and (c) 
predicted gravel percentage of surficial sediments 
from kriging interpolations of usSEABED sediment 
composition data in the Mid- Atlantic Bight. Data 
courtesy of USGS usSEABED database (Reid et al., 
2005). 
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Figure 3.10. Comparison of overlapping portions of the (a) USGS acoustic backscatter data (from Schwab et al., 2002) and (b) 
the predicted gravel percentage in surficial sediments. In general, higher backscatter intensity indicates coarser sediments (e.g., 
gravel) while lower backscatter intensity indicates finer sediments (e.g., sand and mud). Sediment survey data courtesy of USGS 
usSEABED database (Reid et al., 2005). 
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Figure 3.12. Jackknife test of predictor variable importance for the MaxEnt model of hard bottom occurrance. Blue bars indicate 
regularized training gain for models built with each predictor individually. Green bars indicate regularized training gain for models built 
with all other predictors and can be compared to the red bar, which indicates the regularized training gain for a model built using all 
the predictors, to determine how much the regularized training gain decreases with the omission of each predictor. Predictor variables: 
bathy = depth, chl = sea surface chlorophyll concentration, dist = distance from shore, meanphi = sediment mean grain size, sdist = 
signed distance from shelf, slope = slope, slpslp = slope of slope, stddev = bathymetric variance, tur = turbidity. 
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Figure 3. 11. MaxEnt model of hard bottom occurrence. Map depicts the predicted relative likelihood of hard bottom occurrence from the 
maximum entropy model. Data courtesy of USGS usSEABED database (Reid et ai, 2005), NOS Hydrographic (NOAA NGDC 2011), 
and The Nature Conservancy (Greene et al, 2010; J. Greene, The Nature Conservancy, personal communication, March, 2011; M. 
Fogarty NMFS, personal communication, March, 2011). 



chlorophyll concentration had low predictive power by 
itself, but its omission resulted in the greatest decrease 
in the predictive power of the model. This suggested that 
important information may be contained in the interaction 
of surface chlorophyll concentration with one or more of 
the other predictors. 

Model Validation 

The MaxEnt ROC-like analysis indicated good model 
performance, with a training AUC of 0.832 and test AUC of 
0.730 (Figure 3.13). This indicates that -73% of the time 
a randomly selected true hard bottom location will have a 
higher predicted probability of hard bottom presence than 
would any randomly selected location in the study area. 



Table 3. 7: Relative contributions of predictor variables to 
the MaxEnt model for hard bottom occurrence (based on 
cumulative regularized gain estimates). 



PREDICTOR VARIABLE 


PERCENT 
CONTRIBUTION 


distance from shore 


26.4 


slope of slope 


17.6 


depth 


16.0 


signed distance from shelf 


15.0 


mean grain size 


8.9 


slope 


6.8 


surface chlorophyll concentration 


4.1 


turbidity 


3.3 


bathymetric variance 


1.9 
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Figure 3. 13. Receiver operating characteristic-like (ROC-like) curve for the MaxEnt model of hard bottom occurrence. 




Although the model was mapped at a considerably lower spatial resolution (30 arc-second grid cells have an 
average linear dimension of -800-900 m in the study area) compared to the USGS acoustic backscatter data 
(4 m grid cells) from Schwab et al. (2002), comparison of the predicted hard bottom occurrence likelihood 
index map to the USGS backscatter data revealed some qualitative spatial association between areas of high 
backscatter intensity (lighter shades) and areas predicted to have a high probability of hard bottom presence 
by the model (Figure 3.14). 

3.5. Limitations to Interpretation 

Mean Grain Size and Sediment Composition Models 

As with the bathymetry model (Chapter 2), the density of usSEABED survey locations affected mean grain 
size and sediment composition model uncertainty estimates. In the study area, survey density was highest 
nearshore and was considerably lower in offshore areas. Although the mean grain size model and sediment 
composition models seemed to capture meso-scale (1 0's to 1 00's of km) spatial patterns across the study area 
(e.g., predictions of silt in major depositional areas), the models likely missed finer scale patterns in sediment 
distribution, especially in areas where the spatial density of sampling was low. The length scale of features 
that can be resolved will generally be no shorter than twice the local average distance between samples. 
Regardless of sample density, because the spatial resolution of the output model grid is approximately 800 
m, the minimum length scale of features that can be resolved is approximately 1.6 km. While sediment survey 
density in nearshore areas may support the development of a model at a finer spatial resolution, areas with 
fewer survey data (such as offshore of the continental shelf break) limit the resolving power of the model. As 
a result, none of the models presented in this chapter can be used to directly predict the locations of smaller 
features such as hard bottom patches or cold water coral reefs. 

There are also limitations to the reliability of the mean grain size and sediment composition models that are 
related to issues with sediment sample processing. As described previously and discussed in detail by Goff et 
al. (2008), the laboratory-based analyses used to generate the usSEABED extracted data often exclude hard 
components like shell and gravel and may therefore introduce a bias toward finer particles. Goff et al. (2008) 
found that a simple correction (subtracting 0.5 cp from extracted data) removed the average bias between 
extracted and parsed mean grain size datasets. We used that bias correction, but note that it may not apply 
equally across the whole study area or all time periods and surveys. The model could still have under-predicted 
coarse sediments in some areas dominated by extracted data. Moreover, even parsed data may exhibit a bias 
against very large grains (especially cobbles and boulders) if they were excluded by mechanical sampling 
devices or removed in pre-processing. For the sediment composition models, we used both extracted and 
parsed datasets without applying any bias correction to account for the exclusion of hard components in the 
extracted data. As a result, the models were likely biased toward finer particles and may under-predict the 
fraction of gravel particles. 

Other issues arise from the long time span over which samples were collected. Samples in the usSEABED 
database were collected over multiple decades, and thus variability in grain size and sediment composition 
data likely includes a temporal component in addition to spatial variability. It is possible that a sample collected 
in 1970 no longer reflects the true state of the seafloor, but it was used in the model as such. Also, positional 
uncertainty for survey locations was considerably greater for data collected decades ago. 

Mean grain size is a suitable measure of sediment character at survey sites where sediment composition has a 
unimodal distribution, since mean grain size will represent the typical size of sediment particles in the sample. 
However, if the distribution of sediments in a sample is bimodal or multi-modal, then mean grain size will not 
indicate the typical size of sediment particles at the survey site (and in fact very few particles in the sample 
may be this size). For example, the mean grain size for a sample of gravelly-mud sediments could correspond 
to sand, even though sand may not be part of the sample. The mud, sand, and gravel fraction models can be 
used conjunction with the mean grain size models to mitigate this limitation. 

Given that previous maps of sediment composition consisted of point data depicting the dominant sediment 
type at each location (e.g, Williams et al., 2006), the maps developed here represent a considerable advance 
over previously available information. However, given the limitations associated with the use of the usSEABED 
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Figure 3.14. Comparison of overlapping portions of the (a) USGS acoustic backscatter data (from Schwab et al., 2002) and (b) the 
predicted hard bottom occurrence likelihood index. Higher backscatter intensity indicates coarser sediments (e.g., gravel) while lower 
backscatter intensity indicates finer sediments (e.g., sand and mud). Hard bottom data courtesy of USGS usSEABED database, NOS 
Hydrographic database (NOAA NGDC 2011), and The Nature Conservancy (Greene et al., 2010; J. Greene, pers. comm.; M. Fogarty 
pers. comm.). 




sediment data, the maps of predicted mean grain size and sediment composition fractions should be used 
primarily to describe general patterns in sediment distribution. Maps of model uncertainty can be used to 
identify areas where additional survey data are needed and for risk analyses related to decision-making under 
uncertainty. 

Hard Bottom Occurrence Model 

The MaxEnt model of the likelihood of hard bottom occurrence performed well in the cross-validation ROC-like 
analysis. However, there are a number of limitations to the interpretation of this presence-only model, and it 
should be considered an experimental product. 
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First, the environmental predictor variables used to characterize hard bottom locations were at a considerably 
coarser spatial resolution than most hard bottom features on the seafloor (e.g., individual boulders, patches 
of exposed bedrock). As a result, the value associated with a given hard bottom point for a predictor may 
misrepresent the actual value at that precise location. Finer-scale environmental predictor data would enable 
the model to characterize hard bottom locations more accurately in terms of the environmental predictors, and 
as a result model predictions would be more reliable. 

Second, MaxEnt solutions using presence-only data require that sampling effort is distributed homogenously 
over the study area or that biases are known and integrated into the model. We know that sampling effort was 
heterogeneously distributed and that there were significantly more samples collected close to shore, but we 
don't know exactly how effort is distributed. Because we were concerned about sample bias we excluded a 
large number of hard bottom locations in nearshore areas. Sample bias can result in model predictions that are 
overfitto more densely surveyed areas (Phillips etal., 2009; Elith etal., 2011). Because of limitations related to 
sample bias and the inability of MaxEnt models to identify prevalence, Elith et al. (201 1 ) suggest that presence- 
absence modeling methods should be used if presence-absence survey data is available. However, reliable 
hard bottom absence data was not available in the study area. 

Third, unlike the mean grain size and sediment composition models, the MaxEnt output did not provide a 
spatial map of prediction uncertainties. Therefore, it was not possible to assess changes in prediction certainty 
associated with the location of input data or its variability. Lastly, the model did not provide any indication of 
the size of predicted hard bottom features. Rather, the model provided only an indication of the likelihood of at 
least one hard bottom point sample occurring at a given location. This, of course, depends on sample effort - 
but as previously noted the heterogeneity in effort is not explicitly included in the model. 

Additional fundamental limitations to models developed from presence-only data are discussed in detail by Elith 
et al. (2011); anyone interested in quantitative application of the hard bottom model results should thoroughly 
read and understand the limitations discussed in that article. 

Previous maps of hard bottom locations in the study area consisted of point data depicting survey locations 
where hard bottom features were observed, with no indication of the relative likelihood of occurrence across 
the study area (Greene et al., 2010). Therefore, in spite of the limitations associated with presence-only 
modeling, the model of hard bottom occurrence developed here provides additional information at unsampled 
locations that has previously been unavailable, and may be useful to spatial planning until detailed hard bottom 
distribution data can be collected. 
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4.1. SUMMARY 

In this chapter data on several dynamic 
oceanographic variables, including water column 
stratification, sea surface temperature (SST), 
surface chlorophyll, surface turbidity, and near- 
surface zooplankton biomass are compiled for 
the New York study area (Figure 1.2). Data are 
gridded to a common 30 arc-second resolution 
and long-term average (climatological) ocean 
conditions are mapped by season (spring [Mar- 
May], summer [Jun-Aug], fall [Sep-Nov], winter 
[Dec-Feb]). These datasets are intended to 
quantify spatial variation in long-term average 
patterns of physical and biological oceanographic 
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4.2. BACKGROUND 

The hydrography of the study area is characterized by a strong seasonal cycle, considerable freshwater runoff, 
and interactions among three distinct large-scale water masses. These water masses produce strong spatial 
and temporal heterogeneity in both biological and physical parameters, and define biogeographic regions that 
are often clearly delimited by temperature and/or salinity fronts, although the exact position of these fronts 
shifts seasonally and inter-annually. Over the shelf, water is relatively cold and fresh, and comes from the 
Labrador Current via a continuous equatorward coastal current system (Chapman and Beardsley, 1989). The 
northward-flowing warm equatorial waters of the Gulf Stream pass farther offshore. Between these two and 
over the slope lies a water mass, commonly called Slope Water, which is a mixture of shelf water and the Gulf 
Stream. 




The frontal boundary between shelf and slope masses is highly dynamic and changes due to wind forcing, 
gravitational flow, and large scale alterations in atmospheric circulation patterns such as those associated with 
the North Atlantic Oscillation (NAO) (Pershing et al., 2001). Changes in the relative position of these water 
masses not only affect physical parameters, such as water temperature, but also species distributions. For 
instance an infamous 1882 tilefish kill offshore of New Jersey has been attributed to colder-than-usual water 
temperatures and a low NAO index (Marsh et al., 1999). 

Productivity on the shelf is generally nitrogen-limited and therefore is greatest wherever inorganic nitrogen- 
containing nutrients are supplied, typically by processes such as runoff from rivers and estuaries, turbulent 
mixing in warm core rings, wind-driven upwelling intrusions of slope water, and intense tidal mixing at shoals. 
Frontal boundaries and stratification between water masses inhibit mixing, but strong winds, upwelling, and 
eddies can provide sufficient energy to promote mixing and introduce nutrients. Upwelling occurs south of 
Long Island during periods of southwesterly winds and during the passage of storms (Walsh et al., 1978). 
Warm core rings resulting from Gulf Stream meanders occur, but are less frequent than at George's Bank 
(Ingham etal., 1982). 

The shelf's water column stratifies in the spring and summer from warming and freshwater inputs. Stratification 
isolates warm, well-mixed surface water from cold deeper water and deprives the upper (euphotic) zone of 
nutrients. During stratification, primary productivity is highest nearshore where periodic coastal upwelling and 
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runoff can provide nutrients. Offshore productivity is limited to the pycnocline where phytoplankton can get 
nutrients via diffusive fluxes. In late summer, stratification breaks down due to storms and surface cooling. 
By winter the entire water column over the shelf is well-mixed and a sharp frontal zone separates cold, fresh 
nearshore water from warmer, more-saline slope water. 

4.3. METHODS 

In this section, data sources are identified and methods used to interpolate data onto a consistent sampling 
grid are described. All datasets are co-registered on the same 30 arc-second sampling grid used in Chapters 
3 and 6 and clipped to the same study area spatial extent (Figure 1.2). Since sea surface temperature 
(SST), stratification, chlorophyll, turbidity, and zooplankton biomass are time-varying environmental variables 
dominated by seasonal variability, long-term average (climatological) ocean conditions were mapped by 
season (spring [Mar-May], summer [Jun-Aug], fall [Sep-Nov], winter [Dec-Feb]). 

Data processing was carried out using ArcGIS 9.3.1 with the Spatial Analyst extension (Environmental Systems 
Research Group [ESRI], Redlands, CA), Geostatistical Analyst extension (ESRI), XTools Pro 6.2.1 for ArcGIS 
9.x (Data East LLC, Novosibirsk, Russia), and Hawth's Tools for ArcGIS 9.x (Beyer, 2004). 

Water Column Stratification 

Seasonal climatologies of water column stratification were obtained from The Nature Conservancy (TNC) and 
are described in TNC's Northwest Atlantic Marine Ecoregional Assessment (NAMERA) Phase I Report (Greene 
et al., 2010; Shumway et al., 2010). Stratification estimates were originally provided to TNC by Dr. Grant Law 
and subsequently provided to us with permission from the original author (G. Law, pers. comm.). Briefly, 
three-dimensional ocean temperature and salinity data were interpolated from a database of conductivity- 
temperature-depth (CTD) casts, using the OAX5 optimal-analysis algorithm (Hendry and He, 1996). CTD 
casts came from a compilation of Hydrobase (described in Curry, 1996), NOAA National Marine Fisheries 
Service databases (described in Mountain, 2003), Fisheries and Ocean Canada databases (described in 
Gregory, 2004), and South-Atlantic Bight oceanographic data (described in Blanton et al., 2003) (for details 
see Shumway, 2010 and Law, 2011). Stratification was calculated by subtracting the optimally interpolated 
seawater density (measured in kg«m~ 3 ) at 50 meters from the surface seawater density, then averaged to create 
a 1980 - 2007 climatology (Shumway, 2010). Note that by this definition, stratification is usually negative, 
corresponding to less dense warmer and/or fresher water occurring on top of more dense colder and/or more 
saline water. More negative values indicate greater stratification. 

The stratification climatology was provided on a 5 arc-minute grid and bi-linearly resampled to the 30 arc- 
second model grid. We did not explicitly characterize the accuracy of this dataset, but previous accuracy 
assessments of hydrographic data interpolation in this region suggest relative error on the order of 50% (Taylor 
and Mountain, 2003). Given this level of uncertainty, and that the original resolution of this dataset is coarser 
than any of the predictors used in this analysis (spacing between sample points typically 1-10 km or further), 
this layer should be used with caution for planning scales finer than -10 km and for any application requiring 
precise knowledge of stratification at any particular time and place. Better resolution and accuracy might be 
obtained from more recent data-assimilating numerical ocean models. An improved high-resolution gridded 
stratification climatology should be a priority for this region. 

Stratification in the study area was greatest in the spring and summer (Figure 4. 1 ). In these months, stratification 
was higher over a broad area of the shelf, decreasing towards the shelf edge and eastwards to Nantucket 
shoals. In fall, stratification is greatest in the middle of the shelf, with more mixing along shore and to the east 
and west of the study area. In winter stratification is low compared to other seasons. Relative to the seasonal 
mean, it is higher nearshore, especially near the Hudson River and east of Long Island Sound. 

Sea Surface Temperature (SST) 

Seasonal climatologies of sea surface temperature (SST) were obtained by averaging monthly composites 
from the National Aeronautics and Space Administration (NASA) Pathfinder 1.1 km Advanced Very High 
Resolution Radiometer (AVHRR) SST archive for the Northwest Atlantic region, 1985-2001 (Wolfteich, 2011), 
maintained at the University of Rhode Island (URI) and available publicly via OpenDAP (Cornillion et al., 
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Figure 4. 1: Seasonal stratification climatology maps for spring (upper left), summer (upper right), fall (lower left) and winter 
(lower right). Data courtesy of G. Law (Oregon Health Sciences University), J. Greene (The Nature Conservancy), NOAA Fish- 
eries, Fisheries and Oceans Canada, Woods Hole Oceanographic Institution (Hydrobase), and B. Blanton (University of North 
Carolina-Chapel Hill). 

2003) at the following URL: http://satdat1 .gso.uri.edu/opendap/Pathfinder/Northwest_Atlantic/1 km/declouded/ 
contents.html. Radiometry data from AVHRR instruments mounted on NOAA satellites was processed using 
the Pathfinder Algorithm (Casey et al., 2010). Details of the algorithm and processing are provided at the 
following URL: http://satdat1 .gso.uri.edu/opendap/Pathfinder/Pathfinder1 km/pathfinder_1 km. html 

Data were bi-linearly resampled to the 30 arc-second model grid. Given the quality flags that were applied, 
accuracy of the satellite SST estimates is expected to be approximately ±1° Celsius (about 5% for the range 
of temperatures in our region) although this can degrade close to land (Casey and Cornillion, 1999). The 
long time period of averaging resulted in gap-free coverage over the study area except in pixels immediately 




adjacent to land. SST was generally warmer offshore of the continental shelf break relative to nearshore areas 
and varied seasonally, with considerable warming of nearshore areas from spring to summer (Figure 4.2). 
Climatological SST differed little between summer and fall. 

Surface Chlorophyll and Turbidity 

As a proxy for surface primary productivity, seasonal climatologies of chlorophyll a concentration for the period 1998- 
2006 were extracted from high-resolution (-1.1 km) SeaWiFS satellite data processed using standard NASA ocean 
biology processing group (OBPG) reprocessing 5.1 algorithms (Franz and Thomas, 2005). Similarly, as a proxy 
for sea surface turbidity, seasonal climatologies of normalized Lw-670nm for the period 1998-2006 were extracted 
from the same imagery. All SeaWiFS processing was done by the Coastal Oceanographic Assessment Status and 
Trends (COAST) Branch (NOAA/NOS/NCCOS/CCMA/COAST), following previously documented methods (Franz 
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Figure 4. 2: Seasonal sea surface temperature climatology maps for spring (upper left), summer (upper right), fall (lower left) and 
winter (lower right). Data courtesy of C. Wolfriech (University of Rhode Island). 



and Thomas, 2005; Pirhalla et al., 2009), except a despeckling filter was also applied (Gonzalez and Woods, 
1992). Accuracy statistics of ocean color imagery have been reviewed extensively (Franz et al., 2007); under ideal 
conditions SeaWiFS error tolerances are <5% for water-leaving radiances and <35% for chlorophyll a;however, 
errors can be substantially higher in coastal waters (Franz et al., 2007). The long time period of averaging resulted 
in gap-free coverage over the study area except in pixels adjacent to land. 

Chlorophyll a concentrations changed by seasons, but showed similar broad-scale spatial patterns (Figure 4.3). 
Concentrations were highest in the summer and lowest in the winter. In all months, concentrations were highest 
neashore and in Long Island Sound and low over most of the shelf and offshore of the continental shelf break. 
Turbidity showed a similar spatial pattern, but was highest in the spring (Figure 4.4). 
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Figure 4.3: Seasonal chlorophyll a climatology maps for spring (upper left), summer (upper right), fall (lower left) and winter 
(lower right). Data from the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) satellite, processed by D. Pirhalla and V. Ran- 
sibrahmanakul (NOAA/NOS/NCCOS/CCMA/COAST). Raw SeaWiFS imagery provided by NASA under research/educational 
agreement with GeoEye, Inc. 




Near-surface Zooplankton Biomass 

Point estimates of zooplankton biomass (mean displacement volume per volume of water strained) were 
obtained from the NOAA National Marine Fisheries Service's (NMFS) Copepod database. The all-taxa 
zooplankton global compilation was used, available at: http://www.st.nmfs.noaa.gov/plankton/atlas/data_src/ 
copepod-2010 4000000-compilation.txt. 

In the study region, we found 3,122 records of zooplankton biomass from 1966-2001. These were grouped 
by season and processed using ordinary kriging (with locally quadratic detrending) to produce a gridded 
seasonal climatology at the required resolution. We pooled observations over time to estimate the long-term 
climatological spatial mean. Ordinary kriging was used because the data exhibited approximately stationary 




TUrbidity by Season 



HBHiig ndm - ! _t BTOnm i~_d as proxy fcr lutakftf 

I "! NYPIanrv_Ai-a 
Shdf Ed_G 



a 

i_ 



75 

i 



_L 



150 

I 



j_ 



3DD 

I 



I — i — I — i — I 
25 -50 
NsubsJI 



N 



A 



Figure 4.4: Seasonal turbidity climatology maps for spring (upper left), summer (upper right), fall (lower left) and winter (lower 
right). Water-leaving radiance values are normalized to reflect the fraction of incident light reflected, and thus are dimensionless 
numbers between and 1. Data courtesy ofD. Pirhalla and V. Ransibrahmanakul (NOAA/NOS/NCCOS/CCM A/COAST), imag- 
ery was provided by NASA under a research/educational agreement with GeoEye, Inc. 



(though geometrically anisotropic) spatial autocorrelation over the study region after trend removal. Ordinary 
kriging was found to perform better than inverse-distance weighting (IDW) based on leave-one-out cross- 
validation. Predictions were not made where the variogram model explained <2.5% of the total variance (i.e., 
areas far from data points). Relative cross-validation RMSE was 27% averaged across all seasons (spring 
35%, summer 20%, fall 24%, winter 27%). Geostatistical analyses were carried out using Geostatistical Analyst 
forArcGIS9.3(ESRI). 

Zooplankton biomass was greatest in the fall, with patches of relatively high biomass south of Long Island and 
outside of the study area offshore of New Jersey (Figure 4.5). In the spring, summer and winter zooplankton 
biomass was heterogeneously distributed and showed different spatial patterns. 
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Figure 4.5: Seasonal zooplankton climatology maps for spring (upper left), summer (upper right), fall (lower left) and winter 
(lower right). Data courtesy of the NOAA National Marine Fisheries Service Copepod database. 
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Deep Sea Corals 

Dave Packer 1 and Dan Dorfman 2 ' 34 

5.1. SUMMARY 

Deep sea corals are benthic invertebrates known to inhabit cold and deep waters throughout the globe, including 

the Atlantic waters offshore of the State of New York. Many are slow growing, long-lived, and exhibit complex, 

branching forms of growth that, while providing valuable habitat for other species, also makes them particularly 

susceptible to damage from fishing gear and other anthropogenic impacts. Within the New York State offshore 

study area there are 5,619 records of known deep sea coral locations within the deep sea coral geodatabase 

of NOAA's Deep-sea Coral Research and Technology Program (DSCRTP). Of these, 4,625 are of sea pens 

and the remaining 994 are stony corals, 

true soft corals, or gorgonians. The two 

most abundant species of sea pens are 

typically found in the soft sediments 

on the continental shelf. Most of the 

stony corals in this region are solitary 

organisms and are often found on soft 

substrates as well. Many of the true soft 

corals and gorgonians were typically 

found on gravel and rocky outcrops 

around the continental slope. Several 

species are found around Hudson 

Canyon (Figure 5.1); there are very few 

records from the literature of deep sea 

corals within the Canyon itself. 
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Figure 5.1 White sea pens fStylatula elegansj on muddy sand from 119 m on the 
outer continental shelf near Hudson Canyon (left). Solitary hard coral (Dasmosmilia 
lymanij on shelly sand from 108 m on the rim of Hudson Canyon (right). 
Photo credit: P.C. Valentine, USGS. 



We have a very incomplete picture of deep sea coral distribution and abundance in the northeastern U.S. 
region and offshore of New York, and the overall quantity and quality of deep sea coral habitat is unknown. 
There is also a dearth of information on their natural history, as well as difficulties with their taxonomy. Deep 
sea corals in this region face a range of anthropogenic threats from fishing, gas and oil drilling, and ocean 
acidification due to global warming. 

Obviously, in order to better preserve and protect deep corals and deep coral habitat off the northeastern 
U.S. and offshore of New York, there needs to be: 1) an increased mapping and survey effort; 2) more basic 
research on deep coral taxonomy, life history, habitat requirements, species associations, etc.; and finally, 
3) quantification on the susceptibility of deep corals to anthropogenic impacts. There are currently efforts 
underway, under the auspices of the New England Fishery Management Council (NEFMC), in coordination 
the Mid-Atlantic Fishery Management Council (MAFMC), to protect deep sea corals within the New York 
State offshore study area. Several approaches to management and conservation are being evaluated and 
the DSCRTP and NOAA's National Marine Fisheries Service (NMFS) Northeast Fisheries Science Center 
(NEFSC) will be conducting a three year regional investigation into deep sea coral distribution, biology, and 
ecology from fiscal year (FY) 2013 through FY2015. 

5.2. INTRODUCTION 

Deep sea corals are benthic invertebrates known to inhabit cold and deep waters throughout the globe, including 
the Atlantic waters offshore of the State of New York. While considerable attention has been given to tropical 
and subtropical corals and coral reefs, deep sea corals have only recently been researched and managed for 
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their habitat value. Unlike shallow water corals, deep sea corals can be found in cold water habitats throughout 
the globe in a wide range of depths. Deep sea corals can be found from near the surface to about 3,000 m 
depth, although NOAA generally defines them as occurring >50 m on continental shelves, slopes, canyons, 
and seamounts. Deep sea corals are suspension feeders, but unlike most tropical and subtropical corals, do 
not require sunlight and do not have symbiotic algae (zooxanthellae) to meet their energy needs. Deep sea 
corals can occur as small, solitary individuals or as structure-forming corals that provide vertical structure 
above the seafloor that can be utilized by other species; the latter includes both branching corals that form a 
structural framework (e.g., reefs) as well as individual branching coral colonies. Because deep sea corals are 
slow-growing, long-lived, and often have complex, branching forms of growth, they are highly susceptible to 
anthropogenic impacts (such as from fishing gear). These life history traits also compromises their recovery 
from disturbances over short time periods. 



Deep sea corals in the northeastern U.S. 
belong to three major taxonomic groups 
(Figure 5.2). There are the Hexacorals 
(or Zoantharia), which include the 
hard or stony corals (Scleractinia); the 
Ceriantipatharians which includes the 
black and thorny corals (Antipatharia), 
and finally there are the Octocorals (or 
Alcyonaria), with flexible, partly organic 
skeletons that include the true soft corals 
(Alcyonacea), gorgonians (Gorgonacea 
or sea fans and sea whips), and sea 
pens (Pennatulacea). Scleractinians 
identified in the northeastern U.S. 
number 16 species, Anitpatharians 
number possibly four or perhaps more 
species, Alcyonaceans number nine 
species, Gorgonaceans number 21 
species, and Pennatulaceans number 
21 species. Among all three groups, 
there appear to be a suite of species 
(see below) that occurs at depths of less 
than 500 m (shelf and upper slope), and 
a separate suite that occurs at depths 
greater than 500 m (lower slope and 
rise). One species of hard coral and one 
alcyonacean occur in water less than 50 
m deep. 
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Figure 5.2. Deep sea coral taxonomy for those species found in the northeastern 
U.S. from Maine to Cape Hatteras including four seamounts (Bear, Physalia, 
Mytilus, and Retriever) off of Georges Bank that lie within the Exclusive Economic 
Zone (EEZ). 



5.2.1. Studies of Deep Sea Corals 
in the Northeast Region 

Off the northeastern U.S., deep sea 
corals have been noted since the surveys of Verrill in the 19th century (Verrill, 1862; 1878a, b; 1879; 1884) 
and as fisheries bycatch since that period. Theroux and Wigley (1998) described the distribution of deep 
sea corals in the northwest Atlantic, based on samples taken from 1956-1965. They often do not distinguish 
between taxonomic groups; e.g., stony corals such as Astrangia sp. and Flabellum sp. are lumped together 
with the various types of anemones in the subclass Zoantharia. Theroux and Wigley (1998) also discussed the 
soft corals, gorgonians, as well as the sea pens. They were present along the outer margin of the continental 
shelf and on the slope and rise, and were sparse and patchy in all areas, particularly in the northern section. 
Theroux and Wigley (1998) found that they were not collected in samples taken at < 50 m in depth, and were 
most abundant between 200-500 m. Identified species include gorgonians, such as Acanella sp., Paragorgia 



arborea, and Primnoa reseda (now resedaeformis, see Cairns and Bayer [2005]) and the soft coral Alcyonium 
sp. Gorgonians and soft corals were collected from gravel and rocky outcrops (Theroux and Wigley, 1998). 

Watling and Auster (2005) noted two distinct distributional patterns for the gorgonians and soft corals in the 
northwest Atlantic. Most are deepwater species that occur at depths > 500 m; these include species of gorgonians 
in the genera Acanthogorgia, Acanella, Anthothela, Lepidisis, Radicipes, and Swiftia, and soft corals in the 
genera Anthomastus and Clavularia. Other species occur throughout shelf waters to the upper continental 
slope and include the gorgonians P arborea, P resedaeformis, and species in the genus Paramuricea. Both P 
arborea and P resedaeformis are considered widespread off the northeastern U.S; P resedaeformis has been 
reported south to off Virginia Beach, Virginia. The majority of records for Acanthogorgia armata, P arborea, 
and P resedaeformis come from Lydonia, Oceanographer, and Baltimore submarine canyons. 

Dr. Barbara Hecker and her colleagues surveyed the deep sea corals and epibenthic fauna of the continental 
margin and several canyons off the northeastern U.S. in the 1980s via submersible and towed camera sled 
(Hecker et al., 1980, 1983). Corals were denser and more diverse in the canyons, and some species, such 
as those restricted to hard substrates, were found only in canyons while the soft substrate types were found 
both in canyons and on the continental slope (Hecker and Blechschmidt, 1980). For a complete discussion of 
Hecker's and others' surveys and research on deep sea corals as well as a thorough review of deep sea coral 
presence and distribution in the northeastern U.S., see Packer et al. (2007). 

In the Mid-Atlantic Bight, many of the topographic features characteristic of other deep sea coral habitats 
are absent. The relatively small amount of hard substrate in this area occurs in conjunction with submarine 
canyons or are artificial reefs or shipwrecks. The main physiographic feature off of New York State is the 
Hudson Shelf Valley and Canyon, extending from the inner-continental shelf, at about the 40 m isobath, onto 
the continental slope. Sediments over the Mid-Atlantic shelf are fairly uniformly distributed, and are primarily 
composed of sand, with isolated patches of coarse-grained gravel and fine-grained silt and mud deposits 
(Stevenson et al., 2004). Deep sea corals that grow on soft bottom habitats (e.g., sea pens, some stony corals) 
are more common here than in other U.S. regions, especially on the shelf. Deep sea corals have been seen 
on the shelf around Hudson Canyon and in the head of the Canyon (discussed below). 

Although the mid-Atlantic shelf is mostly soft bottom and devoid of major structure forming deep sea corals, 
in the relatively shallow nearshore region off Delaware and Maryland there are patch areas of hard bottom 
containing significant stands of the sea whip Leptogorgia virgulata (Gorgoniidae), which may be a northern 
range extension for this species. These hard bottom areas include natural rocky bottom as well as wrecks and 
artificial reefs, at depths as little as 8 m for the wreck/artificial reef areas, and can be less than 16 km from 
shore. These "shallow-dwelling" deep sea corals may not fit into the standard definition of deep sea corals, but 
these habitats are all known to support high densities of species that prefer structure, such as black sea bass 
(Centropristis striata) and tautog (Tautoga onitis), as well as flounder, so they may be important habitats in 
need of protection. Surveys have not yet been conducted to see if similar nearshore coral patch habitat occurs 
adjacent to other mid-Atlantic states like New York, but it seems unlikely that it would be restricted to Maryland 
and Delaware. 

Despite the aforementioned faunal surveys, our knowledge of the temporal and spatial distribution and 
abundance of deep sea corals off the northeastern U.S., as well as some aspects of their basic biology and 
habitat requirements, is severely limited, so their overall population status is difficult to determine. That, along 
with questions about their taxonomy, makes it difficult, if not impossible, to determine whether there have been 
changes in deep coral occurrence or abundance overtime. (There is, however, more information on deep coral 
distribution and habitat requirements in Canadian waters; e.g., the Northeast Channel [Mortensen and Buhl- 
Mortensen, 2004]). NEFSC groundfish and shellfish surveys from the Gulf of Maine to Cape Hatteras have 
collected corals as part of their bycatch for several decades, but there are many data gaps (e.g., corals were 
not properly identified or quantified) which precludes using the data to assess any long-term population trends. 



The environmental parameters of Mid-Atlantic Bight deep sea corals are also unknown, but Leverette and 
Metaxas (2005) developed predictive models to determine areas of suitable habitat for P. arborea and P. 
resedaeformis along the Canadian Atlantic continental shelf and shelf break. Several environmental factors 
including slope, temperature, chlorophyll a, current speed and substrate were included in the analysis. Their 
results showed that the habitat requirements differed between the two gorgonians. P arborea occurred 
predominantly in steeply sloped environments and on rocky substrates, while the habitat for P resedaeformis 
was more broadly distributed and located in areas with high current speed, rocky substrates and a temperature 
range between 5-1 0°C. 

There have been some more recent, targeted surveys off of New England using trawls and remotely operated 
vehicles (ROVs). In 2003, 2004, and 2005, surveys were conducted of several seamounts in the New England 
and Corner Rise Seamount chains (the latter is approximately 400 km to the east of the New England Seamount 
chain, and nearly midway between the east coast of the U.S. and the Mid-Atlantic Ridge) funded by NOAA's 
Office of Ocean Exploration and National Undersea Research Program. The cruises were multidisciplinary in 
nature, but the goals included studying the distribution and abundance of deep corals relative to the prevailing 
direction of currents; collecting specimens for studies of reproductive biology, genetics, and ecology; and 
studying species associations. 

5.2.2. The Role of Deep Sea Corals as Habitat 

Deep sea corals provide habitat for other marine life, increase habitat complexity, and contribute to marine 
biodiversity (Lumsden et al., 2007). The role of deep sea corals as possible habitats for fishes has been 
studied in other regions. The corals in the Primnoa, Lophelia, and Oculina genera have been the most studied. 
Several studies have documented that certain fish commonly occur in the vicinity of corals more often than in 
areas without corals. In the northwest Atlantic, this has been noted for redfish in the Northeast Channel near 
Georges Bank (Mortensen et al., 2005). Redfish may take advantage of structure on the bottom as a refuge 
from predation, as a focal point for prey, and for other uses. However, in a survey of habitats in the Jordan 
Basin in the Gulf of Maine containing coral assemblages (primarily from the genera Paragorgia, Paramuricea, 
and Primnoa), Auster (2005) found that densities of redfish were not significantly different between dense coral 
habitats and dense epifauna habitats. However, the density of redfish in these two habitats was higher than 
in the outcrop-boulder habitat containing sparse epifauna. While this shows that a habitat without deep corals 
can support similar densities of fish to a habitat containing corals, Auster (2005) states that it is the actual 
distribution of each habitat type throughout a region that will ultimately determine the role such habitats play 
in the demography of particular species and communities. Deep sea coral habitats are fairly rare in the Gulf of 
Maine, but boulder-cobble habitats containing dense epifauna are not. Auster (2005) suggests that deep sea 
corals do have some effect on the distribution and abundance of fishes, but by themselves may not support 
high density, unique or high diversity fish communities. The corals do provide important structural attributes of 
habitat, but may not be functionally different than structures provided by other dense epifaunal assemblages. 

There are few data available about invertebrate species associations with deep corals off the northeastern 
U.S. More is known about the species associations of deep corals and invertebrates from other regions. 
However, recent research suggests that deep corals are important components of benthic communities, 
providing structure and refuge for various other invertebrate species (e.g., Lumsden et al., 2007; Mosher and 
Watling, 2009). 

5.3. OBJECTIVES 

This chapter will focus on describing the distribution of those deep sea corals found off New York State, 
from nearshore to the continental slope, based on historical surveys and databases. When examining the 
information available on the distribution of corals in the area offshore from New York State it is important to note 
that the historical surveys are far from comprehensive and the taxonomy and identification of many of these 
deep sea corals are open to question, so the presence, distribution, and abundance of these deep sea corals 
should be interpreted with caution. The national DSCTRP and NEFSC will be conducting a three year intensive 
investigation on the distribution, ecology and status of deep sea corals off the northeastern U.S. beginning in 
FY2013. 



5.4. METHODS 

The primary source of data used for this analysis was the Cold-water Coral Geographic Database developed 
by the USGS with support from the DSCRTP. The USGS Cold-Water Coral Geographic Database (CoWCoG) 
consolidates the known locations of deep sea corals in the eastern US and provides a tool for researchers and 
managers interested in studying, protecting, and/or utilizing cold-water coral habitats in the Gulf of Mexico and 
western North Atlantic Ocean. The database makes information about the locations and taxonomy of deep 
sea corals available to the public in an easy-to-access form while preserving the scientific integrity of the data. 
The database includes over 1,700 entries, mostly from published scientific literature, museum collections, and 
other databases (Scanlon et. al., 2010). 

This database was supplemented with additional records provided by the Watling et al. database (2003), the 
Theroux and Wigley database (1998), the Smithsonian Institution's National Museum of Natural History and 
NMFS NEFSC's National Systematics Lab, the archives of the former National Undersea Research Center, 
surveys by Dr. Barbara Hecker and her colleagues (e.g., Hecker et al., 1980, 1983), Peter Auster of the 
University of Connecticut, and the NEFSC. Many of these records were obtained through a data mining project 
sponsored by the DSCRTP. For the complete list of northeast deep sea coral references, see Packer et al. 
(2007). It should be noted that the distribution maps presented in this chapter show presence only; i.e., they 
only describe where deep corals that could be identified were observed or collected. Since all areas have not 
been surveyed and since some specimens were not identified, the true distributions of many of these species 
remain unknown. However, these combined databases represent the best available georeferenced data on the 
presence of deep corals in the northeast region. 



Table 5.1: Number of deep sea coral records in the study area as a total and by 
taxonomic order 



TAXONOMIC ORDER COUNT 


All Deep Sea Corals 


5,619 


Pennatulacea 


4,625 


Scleractinia 


338 


Alcyonacea 


365 


Gorgonacea 


291 



5.5. RESULTS 

There are a total of 5,619 records of 

known deep sea corals for the study 

area (Table 5.1). Of these, 4,625 are for 

sea pens (Pennatulacea; Figure 5.3). 

The most common and fairly widespread 

species found in the deeper parts of 

the continental shelf are Pennatula 

aculeata (common sea pen) (Langton 

et. al., 1990) and Stylatula elegans 

(white sea pen). P. aculeata has been 

reported down to depths of 3,300 m. S. elegans is abundant on the mid-Atlantic coast outer shelf (Figure 5.3) 

and has been found as deep as 800 m. Unlike most other deep sea corals, sea pens live in muddy or other 

soft sediments, anchored in place by a swollen, buried peduncle. Some species are capable of retracting part 

or the entire colony into the sediment when disturbed. Observation suggests that sea pens are resistant to 

physical disturbance, although growth and population dynamics have not been investigated, but because of 

their ubiquity they are generally not of concern for biodiversity or ecosystem management. 

The remaining 994 records are observations of hard and soft corals, mostly from the shelf and slope. There are 
338 records of stony corals (Scleractinia; Figure 5.4). These observations occur between 14 and 2,654 meters 
depth. Most of the stony corals in this region are solitary organisms and are often found on soft substrates. 
Species observed included Dasmosmilia lymani (Figure 5.1), Flabellum alabastrum, and Astrangia sp. (most 
likely Astrangia poculata, which can occur in very shallow water, at depths of only a few meters). 

There are 365 records of true soft corals (Alcyonacea; Figure 5.5). These observations were made between 30 
meters and 3,506 meters, with notable concentrations occurring at the shelf edge between 100 and 200 meters 
and again between 2,000 and 3,000 meters depth. Species observed included, Capnella florida, Anthomastus 
grandiflorus, Anthomastus agassizii, and Gersemia fruticosa. Gersemia rubiformis is very numerous in 
nearshore records throughout the northeast. 



There are 291 records of Gorgonians occurring between 310 and 3,206 meters, predominantly deeper than 
600 meters (Figure 5.6). Species observed included Acanella arbuscula, Acanthogorgia armata, Anthothela 
grandiflora, Lepidisis caryophylia, P. arborea, Paramuricea grandis, P. resedaeformis, Swiftia casta, and 
Radicipes gracilis. 
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Figure 5.4. Observed Scleractinea locations. 



Figure 5.3. Observed Pennatulacea locations. 
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Figure 5.5. Observed Alcyonacea locations. 
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Figure 5.6. Observed Gorgonacea locations. 



Hudson Canyon 

Deep sea corals have been seen on 
the shelf around Hudson Canyon 
and in the head of the Canyon. For 
example, a survey by Guida 1 of benthic 
habitats on the shelf around Hudson 
Canyon in 2001, 2002, and 2004 found 
the solitary stony coral D. lymani at 
a number of sites at depths ranging 
from 100 to 200 m (Figure 5.7). They 
were particularly abundant, occurring 
in patches in a narrow band along the 
canyon's rim near its head at depths 
of 105-120 m; local densities within 
those patches exceeded 200 polyps 
m 2 , but densities elsewhere were much 
lower. Other records of deep sea corals 
around Hudson Canyon can be found 
in Packer et al. (2007). However, the 
only evidence of deep corals occurring 
deep within the canyon itself comes 
from Hecker and Blechschmidt (1980), 
who found abundant populations of the 
soft coral Eunephthya fructosa (same 
as G. fructosa?), but only in the deeper 
portion of the canyon. This may be due 
to the predominance of soft substrate 
within the Canyon itself, although recent 
mapping surveys (Guida 1 ) have found 
evidence of hard bottom areas that may 




Figure 5. 7. Distribution and approximate densities (polyps per square meter) of 
the solitary stony coral Dasmosmilia lymani in samples from the Mid-Atlantic shelf 
around Hudson Canyon (Guida 1 ). Data obtained from still photos and trawl samples 
taken during October and November 2001, 2002, and August 2004. 



serve as deep sea coral habitat. 



In addition to deep sea corals, there are records of sponges around Hudson Canyon, a result of a single 
research effort by Guida 1 . Structure forming sponges are expected to play a similar role to deep sea corals by 
providing potential habitat for other species. Sponges were observed at 22 locations, with densities ranging 
from 0.1 to > 100/sq dm (Figure 5.8). An effort is currently underway to obtain and document records of sponge 
occurrences off the northeastern U.S. (e.g., from the Smithsonian Institution's database). 

An additional map is included showing both deep sea corals and sponges in a single image. (Figure 5.9) 
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Figure 5. 8. Observed deep sea sponge locations. 



1 Guida, V. 2004. Unpubl. data. NOAA, NMFS, NEFSC, James J. Howard Marine Sciences Laboratory, 74 Magruder Road, Sandy 
Hook Highlands, NJ 07732. 





Figure 5.9. Observed deep sea coral and sponge locations. 



Northeast Region 

Our knowledge of the distribution of deep sea corals for the Northeast Region (Maine - North Carolina) is 
incomplete, so conclusions drawn from existing data should be considered preliminary. The known distribution 
of corals in the region is shown in Figure 5.10. Distribution of corals in the region is similar to the distribution 
found within the study area, with concentrations occurring on the shelf edge and shelf break. 
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Figure 5.10. Known deep sea coral locations for the Northeast Region. 




5.6. DISCUSSION 

Deep sea corals provide habitat for other marine life, increase habitat complexity, and contribute to marine 
biodiversity, and their destruction could impact other marine species. Anecdotal data from surveys as well as 
reports from fishermen, who have brought corals up as bycatch since the 19th century, suggest that deep sea 
corals have become less common or their distributions have been reduced due to the impacts of bottom fishing 
(e.g., off New England); fishing has had significant effects on deep sea coral populations in other regions. 
Deep corals are especially susceptible to damage by fishing gear because of their often fragile, complex, 
branching form of growth above the bottom. Also, they grow and reproduce at very slow rates, with some 
estimates on the scale of hundreds of years. Recruitment rates may also be low, which makes their recovery 
from disturbances difficult over short time periods. Of the various fishing methods, bottom trawling has been 
found to be particularly destructive (e.g., Rogers, 1999; Hall-Spencer etal., 2001; Koslowetal., 2001; Krieger, 
2001; Fossa et al., 2002; Freiwald, 2002). 

The effects of current and historic fishing efforts on deep sea coral and coral habitats in the northeastern U.S. 
have not been quantified. The types of fishing gear used in the region include fixed gear, such as longlines, 
gillnets, and pots and traps, as well as trawls and dredges. Fixed gear can be lost at sea, where it can continue 
to damage corals. In Canada, longlines have been observed entangled in deep sea corals such as Paragorgia 
and Primnoa and may cause breakage (Breeze etal., 1997; Mortensen etal., 2005). Bottom trawling was found 
to have a larger impact on deep sea corals compared to longlining (e.g., damage to Primnoa off Alaska [Krieger, 
2001]). The northeastern U.S. fisheries that have the highest likelihood of occurring near concentrations of 
known deep sea coral habitats (e.g., in canyon and slope areas) are the monkfish or goosefish and tilefish 
fisheries, and the red crab and offshore lobster pot fisheries. 

Other potential threats to deep sea corals in this region include possible oil and gas drilling in the deeper parts 
of the shelf, and ocean acidification due to global warming. Deep sea coral communities may be uniquely 
vulnerable to changes in ocean chemistry associated with ocean acidification due to increased atmospheric 
C0 2 from the combustion of fossil fuels (Guinotte et al., 2006). The ocean acts as the largest net sink for C0 2 , 
absorbing this gas from the atmosphere and then storing carbon in the deep ocean. Oceanic uptake of C0 2 
drives the carbonate system to lower pH and lower saturation states of the carbonate minerals calcite and 
aragonite, the materials used to form supporting skeletal structures in many major groups of marine organisms, 
including corals (Kleypas et al., 2006). This change in ocean chemistry will reduce the ability of corals; i.e., 
stony corals, to lay down calcium carbonate skeletons (calcification). There is evidence that the rate of C0 2 
increase in the deep ocean has been occurring at a pace double that of shallow waters and therefore the 
effect of ocean acidification on deep sea corals could be significant (e.g., Bates et al., 2002; Guinotte et al., 
2006). The ability for organisms to calcify decreases in the deep ocean naturally with latitude, temperature, 
and pressure, causing an increased concern for deep sea corals in the near future. There are also areas in 
the ocean where a natural boundary, known as the 'saturation horizon', exists below which organisms may 
have difficulty forming calcium carbonate. This is due to the physical factors already mentioned that decrease 
calcification in the deep ocean, but as C0 2 levels increase the saturation horizon will become shallower. This 
would severely limit the distribution of deep sea corals in certain parts of the ocean (The Royal Society, 2005). 

In 2005, NEFMC and MAFMC, with the NEFMC as the lead, approved the designation of Oceanographer 
and Lydonia Canyons (located off New England on the continental slope south of the Georges Bank fishing 
grounds; approximately 116 square nautical miles) as Habitat Closed Areas (HCA) and added these areas to 
the NEFMC's network of HCAs (or marine protected areas). These new HCAs are closed indefinitely to fishing 
with bottom trawls and bottom gillnets in order to minimize the impacts of the monkfish fishery on Essential 
Fish Habitat (EFH) in these deep sea canyons and on the structure-forming organisms therein, including deep 
sea corals. Veatch and Norfolk Canyons are also protected under the Tilefish Management Plan and are 
closed to bottom-tending gear. Recently, a working group of the NEFMC has developed a series of proposals 
for the designation of specific deep sea coral protection zones off the northeastern U.S. using the discretionary 
authorities under the Magnuson-Stevens Act Section 303(b). They have also developed a range of possible 
management options for those zones and suggestions for future research. 



Areas where deep sea corals are present should be considered vulnerable marine ecosystems and efforts are 
underway to extend protection to these valuable natural resources. The NOAA Strategic Plan for Deep Sea 
Coral and Sponge Ecosystems encourages avoidance of adverse impacts of non-fishing activities on deep sea 
coral and sponge ecosystems. Impacts to deep sea corals and sponge ecosystems should be evaluated when 
considering off-shore development of energy facilities and infrastructure. 

Packer et al. (2007) outline some of the research priorities for the deep sea corals in this region. To better 
preserve and protect them, first there needs to be increased mapping and survey efforts, and more basic 
research is needed on their taxonomy, life history, habitat requirements, water chemistry, species associations, 
etc. Predictive modeling for individual species of deep sea corals has assisted research and monitoring efforts 
in other regions (e.g., Davies and Guinotte, 2011) and is recommended for this region in order to support 
planning and management decisions. There needs to be a better understanding of how anthropogenic impacts 
(e.g., fishing, ocean acidification, oil and gas drilling) affect the deep sea corals of this region. As mentioned 
previously, the DSCRTP and NEFSC are planning to conduct deep sea coral fieldwork off the northeastern 
U.S., including the MAFMC and NEFMC regions, in 2013-15. 
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Image 6.1. Cory's Shearwater and Wilson's Storm-Petrels. 
Photo by: David Pereksta, BOEM 



Predictive Modeling of Seabird Distribution Patterns 

in the New York Bight 1 

Brian P. Kinlan 12 , Charles Menza 1 , and Falk Huettmann 3 

6.1. SUMMARY 

In this chapter we develop and present maps of 

the seasonal and annual distributions of selected 

seabird species and species groups in the New 

York study area (the NY Bight, Figure 1.2). 

The maps are based on seabird-environment 

statistical models fit to visual shipboard 

seabird observational data collected as part 

of a standardized survey program from 1980- 

1988. Models are developed for single species 

and for species groups, and then combined to 

produce "hotspot" maps depicting multi-species 

abundance and diversity patterns. In addition to 

a large geodatabase of standardized offshore 

seabird surveys, the predictive models developed 

here make use of spatially explicit environmental 

data products from long-term archival satellite, 

oceanographic, hydrographic, and biological 

databases that were developed and discussed in Chapter 4. Seabird distribution maps produced include 

seasonal and annual relative indices of occurrence and abundance, with associated maps depicting metrics of 

certainty. All information is mapped on the same 30 arc-second (less than one kilometer) horizontal resolution 

grid used to characterize surficial sediments and ocean habitat variables in Chapters 3 and 4. 

High-resolution, contiguous predictive maps of seabird distributions and maps depicting accuracy of model 
predictions were two critical information gaps identified in discussions with New York State's Department of 
State, Ocean and Great Lakes Program. These products are expected to be useful contributions to offshore 
spatial planning, particularly for activities that may affect seabirds or their habitats. 

6.2. DEFINITION OF SEABIRDS 

In this chapter we operationally define seabirds as all avian species regularly sighted over marine waters. Given 
this definition, most species included in this chapter belong to the following taxonomic orders: Charadriiformes 
(gulls, terns, auks, phalaropes), Pelecaniformes (gannets, pelicans, and cormorants), and Procellariiformes 
(shearwaters, fulmars, petrels). These species generally derive the majority of their nutrition from marine 
productivity. Some species in the orders Anseriformes (ducks, geese, swans) and Gaviiformes (loons) are 
also included in our operational definition, but most seafaring species in this order rely only partially on the 
marine environment. Many of the species presented in this chapter, especially those preferring more nearshore 
habitats, also fall under various definitions for shorebirds or waterbirds. We also note that although some 
Falconiformes (falcons, osprey) can be pelagic, and some passerine birds can be observed offshore from time 
to time, we did not include them in this study due to lack of data. 
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7000. 
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6.3. SEABIRD ECOLOGY IN THE NEW YORK BIGHT 

Seabirds are a conspicuous and ecologically important component of coastal and marine ecosystems. They 
are typically long-lived (15-70 years), move over broad ranges and feed at a variety of trophic levels. As 
such, they are responsive to changes in the marine and coastal environment and can be useful indicators 
of cumulative biological, physical and chemical changes in marine ecosystems. However, because seabirds 
are highly mobile, long-lived, and occupy a dynamic environment, observations and predictive models of their 
spatial and temporal distributions present a formidable challenge (see Section 6.6.). 

The New York study area is located along the Atlantic Flyway, one of four major American migration routes 
for migrating waterfowl, shorebirds, predatory birds, songbirds and seabirds. Along the Flyway, coastal and 
marine habitats provide shelter and food for birds at stopover sites between wintering areas (generally to 
the south) and breeding areas (generally to the north, but some species, such as the Great Shearwater, 
Sooty Shearwater, and Wilson's Storm-Petrel, are southern hemisphere breeders). The Flyway is generally 
considered to follow the Atlantic shoreline, but some species like the Manx Shearwater and Arctic Tern migrate 
out at sea, far from land (Guilford et al., 2009; Egevang et al., 2010). 

Most species are temporary residents gathering food in pelagic and coastal habitats as they overwinter during 
the non-breeding season, stopover during migration, or breed during the summer months. The community of 
seabirds in the study area is constantly changing and a dominant species in one season may not be observed 
in other seasons. For instance, Wilson's Storm-Petrel is one of the most abundant species in the summer 
months, but is practically absent from surveys in the winter. 

Most species found in the study area breed elsewhere, but at least 10 seabird species breed along New 
York's mainland shores, on Long Island and on a few smaller offshore islands (New York State Ornithological 
Association, Breeding Bird survey). The main breeding season lasts from May to early September (Forbush, 
1929; Bull, 1974; Harrison, 1978), during which time seabirds use the study area to acquire food critical to 
brood success. Breeders usually arrive earlier and stay later than the breeding seasons to prepare for breeding 
and migration to overwintering sites. In the maps shown later in this report, the breeding period is best reflected 
by "summer" data. 

Seabirds occupy an assortment of ecological niches and thus exhibit a range of spatial distributions. Some 
species spend the majority of time along coastal shorelines, while others live offshore coming to land only to 
breed. As in terrestrial ecosystems, some marine areas are more important than others; however few regions 
of the ocean are entirely free of seabirds at all times. "Hotspots," or "persistent aggregations" of seabirds, 
defined here as areas where higher-than-average abundance (or diversity) of seabirds is frequently observed, 
are often located where food availability is high and/or where the required effort to obtain food is comparatively 
low. Elevated food availability can be natural (e.g., areas of high ocean productivity) or anthropogenic (e.g., 
areas of fishery discards and human refuse disposal). Hotspots of abundance and/or species diversity may 
also form near breeding areas and along migratory pathways. Due to the strong seasonal signal in the NY 
marine and coastal environment, the location of seabird hotspots is likely to vary not only among species but 
among seasons even for the same species. 

Understanding environmental, biological/ecological and anthropogenic processes that affect seabird behaviors 
is critical to understanding seabird distributions. Many studies have shown a strong correlation between seabird 
distribution and biophysical variables, including sea surface temperature, mixed-layer depth (stratification), 
the location of prey and subsurface predators, weather, and distance to nesting sites (e.g., Schneider 1990, 
1997; Ballance et al., 2001 ; Daunt et al., 2003; Yen et al., 2004; Friedland et al., 2012). In this chapter we take 
advantage of some of these environmental relationships to generalize from sighting data to make predictions 
about seabird occurrence and abundance at unsurveyed locations. 

In addition to biophysical variables, seabird communities are tightly linked to many human activities, including 
fishing, hunting, coastal development, shipping, and resource extraction. The present-day seabird community 
structure must be viewed in the context of both past and present conditions and impacts, both human and 
natural. For instance, humans have drastically altered the population levels of Common Terns and Herring 




Image 6.2. Wilson's and Leach's Storm-Petrels. 
Photo by: David Pereksta, BOEM 



Gulls. Credible reports identify Common Tern 

nesting sites with hundreds of thousands of birds 

in the Northwest Atlantic in the late 19th century 

(Brewster, 1879). Now, the population consists of 

approximately 40,000 pairs, almost half of which 

are found in New York (Nisbet, 2002). Common 

Terns were hunted for use in the hatmaking 

industry in the late 1 9 th century, but modern threats 

include habitat destruction and disturbance, 

chemical pollution, rat predation on eggs, and 

competition with expanding populations of large 

gulls often fed by byproducts of human activities 

(BirdLife International, 2012). In 1900 the U.S. 

Herring Gull population numbered 8,000 breeding 

pairs and was entirely located in Maine. This 

followed a long period during which the species 

was intensively hunted for eggs and feathers 

(Pierotti and Good, 1994). Now there are greater 

than 100,000 pairs and it is one of the most common species found in the New York study area (Andrews, 

1990). The increase is attributed to protection from hunting, increased waste from fisheries, and decreased 

competition for small fish and invertebrates, as human impacts reduced abundances of large top predators 

(Pierotti and Good, 1994). 

Interactions between fisheries and seabirds have been well documented worldwide, with both increases 
and decreases to regional seabird populations linked to fishing activity (Tasker et al., 2000; Furness, 2003; 
Tasker and Furness, 2003; Votier et al., 2004; Lotze and Milewski, 2004). Distributions are also affected; for 
example, discards from large-scale trawling operations can attract extremely large, but transient, aggregations 
of seabirds. Bartumeus et al. (2010) found that fishery discards distort seabird movement patterns at regional 
scales, and modify the natural way in which seabirds explore the seascape to look for resources. 

6.4. THREATS TO SEABIRDS 

Currently, the greatest threats to seabirds in the region are generally considered to be habitat destruction/ 
alteration, nesting disturbance, the direct (e.g., mortality in bycatch) and indirect (e.g., overfishing, lights) 
impacts of fisheries (Tasker etal., 2000; Votier etal., 2004), other seabirds (Drury, 1965), oil spills, and climate 
change (Riou et al., 201 1 ). Since seabirds are mobile and interact with multiple environmental and anthropogenic 
stressors over vast geographic distances and long lifetimes, it is important to consider cumulative impacts and 
their synergies in evaluating threats to seabirds. 

Many species migrate long distances between breeding and wintering sites. Along the way they can cross 
multiple ecosystems and geopolitical boundaries. For example, the Arctic Tern migrates an average of 71 ,000 
km a year, crosses two oceans and flies adjacent to four continents (Egevand etal., 2010). Seabird movements 
also mean that seabirds are affected by environmental changes at local, regional and global scales. Examples 
of seabird threats that may occur outside of the study area, but impact populations in the study area, include 
watershed and coastal development, predators on nest sites, overfishing, bycatch, oil spills, and climate 
change. 

There is evidence that climate change may pose increasing threats to seabird populations in the future. 
Anthropogenic changes in primary and secondary productivity in the Northwest Atlantic have already propagated 
up the food chain, affecting seabird breeding success (Riou et al., 2011). Anthropogenic effects on food webs 
may interact with natural interdecadal variability in ocean climate and productivity and regional-to-global scale 
changes in the ocean-atmosphere system driven by human activities to cause serious cumulative impacts 
(Sandvik and Erikstad, 2008). Links between natural and human climate changes and seabird population 
dynamics are an active area of research (Sandvik and Erikstad, 2008; Riou et al., 2011). 



Potential impacts of offshore alternative energy production facilities (e.g., wind turbines, ocean thermal energy 
conversion, and marine hydrokinetic devices) on marine avifauna are also an active area of investigation. A full 
review of the literature on offshore energy platform effects on seabirds is beyond the scope of this report. For 
useful entry points into the relevant literature, see Drewitt and Langston, 2006; Hueppop et al., 2006; Hatch 
and Brault, 2007; Allison et al., 2008; and Watts, 2010. The effects of offshore wind farms on birds are likely to 
be highly variable, and will depend on a wide range of factors including: the type of construction, the habitats 
affected and the number and species of birds present (Drewitt and Langston, 2006). The principal potential 
impacts are thought to be: collision mortality, displacement of foraging areas and migration routes, and habitat 
change and loss from platform installation, operation and maintenance. Vulnerability to and likelihood of these 
impacts varies in a species, place and time-dependent manner (Allison et al., 2008). Spatial information on 
seabird abundance, diversity and habitat is therefore essential to reduce potential risks from offshore wind 
development. 



6.5. MANAGEMENT AND CONSERVATION STATUS 

Several domestic laws, Executive 
Orders and international treaties 
provide protection for seabirds. Multiple 
species found in the study area are 
listed by state, federal and international 
conservation listing agencies as species 
of conservation concern because of 
declining or already small population. 
Table 6.1 lists species identified by 
the U.S. Fish and Wildlife Service as 
species of conservation concern in the 
mid-Atlantic that "without additional 
conservation actions, are likely to 
become candidates for listing under 
the Endangered Species Act" (USFWS, 
2008). 



Table 6.1. Birds of conservation concern identified by the U.S. Fish and Wildlife 
Service for the New England/Mid-Atlantic Coast (Bird Conservation Region 30 
[BCR30]) and birds listed under the Endangered Species Act (ESA). Species 
shaded in grey are commonly observed greater than 10 km from shore. 



Roseate Tern (*) 


Buff-breasted Sandpiper (nb) 


Red-throated Loon (nb) 


Short-billed Dowitcher (nb) 


Least Tern (c) 


Pied-billed Grebe 


Gull-billed Tern 


Horned Grebe (nb) 


Great Shearwater (nb) 


Black Skimmer 


Audubon's Shearwater (nb) 


Short-eared Owl (nb) 


American Bittern 


Whip-poor-will 


Least Bittern 


Red-headed Woodpecker 


Snowy Egret 


Loggerhead Shrike 


Bald Eagle (b) 


Brown-headed Nuthatch 


Peregrine Falcon (b) 


Sedge Wren 


Black Rail 


Wood Thrush 


Wilson's Plover 


Blue-winged Warbler 


American Oystercatcher 


Golden-winged Warbler 


Solitary Sandpiper (nb) 


Prairie Warbler 


Lesser Yellowlegs (nb) 


Cerulean Warbler 


Upland Sandpiper 


Worm-eating Warbler 


Whimbrel (nb) 


Kentucky Warbler 


Hudsonian Godwit (nb) 


Henslow's Sparrow 


Marbled Godwit (nb) 


Nelson's Sharp-tailed Sparrow 


Red Knot (rufa ssp.) (a) (nb) 


Saltmarsh Sharp-tailed Sparrow 


Semipalmated Sandpiper (Eastern) (nb) 


Seaside Sparrow (c) 


Purple Sandpiper (nb) 


Rusty Blackbird (nb) 


(*) ESA listed, (a) ESA candidate, (b) ESAc 
population of Threatened or Endangered sp 


elisted, (c) non-listed subspecies or 
>ecies, (nb) non-breeding in this BCR 



Species listing under the Endangered 

Species Act (ESA) of 1973 can have 

significant impacts to offshore spatial 

management decisions. For instance, 

ESA prohibits federal agencies from 

authorizing, funding or carrying out 

actions that "destroy or adversely 

modify" designated critical habitat of 

species on the federal endangered 

species list. This authority applies to 

all federal waters and although this 

regulatory aspect does not apply 

directly to State waters, large-scale 

development projects typically require a 

federal permit. Critical habitat protection 

is only one provision of the ESA. Others 

include take prohibitions and a requirement that Federal agencies consult on potential adverse effects to listed 

(threatened or endangered) species pursuant to Section 7. In addition, non-federal entities are required to get 

a Section 10(a)(1)(B) permit if their proposed action will result in take of a listed species. A full discussion of 

the implications of the ESA for marine and coastal activities that may impact listed bird species is beyond the 

scope of this document; the reader is referred to Baur and Irvin (2009) as an entry point and recent review of 

ESA law and policy. 



In addition to the ESA, which affects listed species, 
the Migratory Bird Treaty Act (MBTA) of 1918, 
as amended, protects all species of seabirds in 
the U.S. The MBTA makes it unlawful in most 
cases to take, kill, possess, transport or import 
migratory birds, their eggs or their nests. The 
USFWS and Department of Justice are allowed 
enforcement discretion. To date enforcement has 
focused on persons or operations that have taken 
birds with blatant disregard for the law. In general, 
offshore enforcement of both MBTA and ESA 
poses a significant and unresolved challenge. 
The provisions of the MBTA apply equally to 
federal and non-federal entities, except where 
exempted (e.g., selected military activities). Table 
6.1 summarizes the USFWS species of concern 
for the Bird Conservation Region (BCR) including 
the New York Bight. 




Image 6.3. Double-crested Cormorant. 
Photo by: David Pereksta, BOEM 



Additional federal efforts for seabird conservation include: Executive Order 13186 - Responsibilities of Federal 
Agencies To Protect Migratory Birds, which was implemented in 2001, the Fish and Wildlife Conservation Act 
of 1980, and guidelines of the Magnuson-Stevens Fishery Conservation and Management Act of 1976, as 
reauthorized. In the latter, Fisheries Management Councils must select measures that, to the extent practicable, 
minimize seabird bycatch and bycatch mortality. In doing so, Councils are advised to consider effects on both 
seabirds and other protected species (e.g., marine mammals, sea turtles). 

New York State's Environmental Conservation Law (ECL) supplements federal laws to help conserve seabirds 
in State waters. For instance, New York makes its own list of endangered, threatened and species of special 
concern to supplement the federal list and has similar prohibitions (§ 11-0535). New York also has its own 
conservation measures like the Bird Conservation Area (BCA) program. The BCA program integrates bird 
conservation interests into agency planning, management and research projects and to date has set aside 
52 areas in the State to safeguard and enhance populations in important habitats. To date, bird conservation 
areas have been created for terrestrial and estuarine systems, but not for pelagic systems. 

The New York Department of State (NY DOS) is currently leading an effort to formalize the process of identifying 
significant offshore habitats for wildlife, which could lead to identification of important habitats for seabirds. The 
procedures used to identify, evaluate and recommend areas for protection are being taken from NY State's 
procedures previously used in coastal waters (Ozard, 1984). The data and maps presented in this report are 
intended as a resource to support identification of significant habitats to coastal and pelagic seabirds. 

6.6. CHALLENGES OF UNDERSTANDING SEABIRD DISTRIBUTION AND ABUNDANCE 

Seabirds are highly mobile organisms that range widely and respond to shifting and dynamic features in their 
physical and biological environment at time scales from minutes to years. Developing contiguous distribution 
maps at the relatively fine spatial scales (0.5-5 km horizontal resolution or better) needed for offshore planning 
off NY is a formidable challenge, not least because any discernible long-term average spatial patterns must be 
inferred from incomplete observations on a process with a tremendous amount of inherent variation. 

Traditionally, offshore seabird distribution data has been represented by atlases using spatially aggregated 
observations at coarse spatial resolutions of 10-15 kilometers or greater (Jespersen, 1924; Moore, 1951; 
Powers, 1983). Often, non-quantitative data (sightings collected anecdotally outside of standardized surveys) 
are incorporated into these general descriptions of a species' range. Such coarse descriptions are helpful 
for some regional monitoring, assessment and planning purposes, but are inappropriate for detailed spatial 
planning decisions. Increasingly, state and local managers are being asked to make resource management 



decisions at much finer resolutions. For instance, the U.S. Department of the Interior (DOI) Bureau of Ocean 
Energy Management (BOEM) divides the Outer Continental Shelf (OCS) into leasing blocks for energy leasing 
purposes that are approximately 5x5 kilometers (length x width), and it is anticipated that areas as small 
as 1/1 6th of a lease block could be leased. Maps in most seabird atlases are too coarse to differentiate 
seabird communities in 5x5 km lease blocks, let alone the smaller areas that are being considered. One of the 
objectives of this chapter is to develop maps at the spatial scales that managers are using to make decisions 
in the waters offshore of NY. 

Improving the spatial resolution of predictive maps of seabird distribution requires dealing with two issues 
that arise at fine spatial scales: data gaps, and the inevitable increase in uncertainty associated with making 
predictions at finer spatial scales. To address the problem of discontinuous data (data gaps), we adopt a 
predictive spatial statistical modeling approach. This approach takes advantage of the increased availability 
of biophysical data at fine spatial resolutions and over broad spatial extents (see Chapters 2, 3, 4), along with 
statistical modeling techniques that analyze and generalize the spatial information contained in observations 
(see Appendix 6.A). We combine regression (generalized linear modeling [Fox, 2008]) and geostatistical (kriging 
[Cressie, 1993; Chiles and Delfiner, 1999]) approaches in a statistical modeling framework to predict the long- 
term average probability of occurrence and relative abundance of a variety of seabird species and groups. In 
this approach, both seabird-environment linkages and spatial autocorrelation are used to make predictions 
about unobserved locations from the scattered available data. We note that our approach does not attempt 
to predict the location of individual birds at a particular time. Rather, we model the long-term average pattern 
(called the 'spatial climatology') of seabird occurrence and abundance. The spatial climatological approach is a 
useful way to map persistent patterns in the distribution of dynamic organisms (e.g., Santoraand Reiss, 2011), 
and is less data-intensive than individual-based approaches. 

A second challenge of fine-resolution mapping of a dynamic, incompletely sampled spatial process like seabird 
abundance is characterizing and conveying uncertainty. Uncertainty in predictive maps of a dynamic living 
resource (e.g., seabird populations) comes from several sources, including sampling, measurement error, and 
the intrinsic dynamics of the resource itself (e.g., migration, mortality, reproduction) in conjunction with natural 
and anthropogenic changes in the environment. When environmental variables are used as predictors, they 
are often also measured with uncertainty, and/or may only serve as indirect proxies of the underlying driving 
mechanisms (e.g., water column stratification may correlate with prey availability). It is important to understand 
and communicate uncertainties to ensure there is an awareness of the limitations inherent in the use of static 
maps to represent dynamic resources. Our choice of the regression-geostatistical modeling framework allows 
us to produce maps quantifying how uncertainty varies over space. We also employ a battery of diagnostic 
statistics (e.g., error magnitude, prediction skill, receiver operating characteristic [ROC] curve analysis, and 
cross-validation statistics) to assess model accuracy, validity, and performance. Cross-validation diagnostic 
statistics are particularly important because 
they provide an integrated assessment of model 
performance given uncertainties and possible 
violations of model assumptions. 

6.7. SUMMARY OF PREVIOUS STUDIES 
RELEVANT TO THE STUDY REGION 

Much is known about birds in inland, estuarine and 
coastal habitats of New York, but less information 
is available for offshore habitats. Collecting data 
in offshore habitats is expensive and few datasets 
provide a comprehensive depiction of the seabird 
community over many years, seasons and 
places. The data used for this report come from 
the Manomet Bird Observatory (MBO) Seabird 
and Cetacean Assessment Program (CSAP) 
database, which contains species-specific 





Image 6.5. Red Phalarope. Photo by: David Pereksta, BOEM 



Image 6.4. Common Tern. 
Photo by: David Pereksta, BOEM 



sightings from multiple years (1980-1988). 
Spatially, temporally, and taxonomically, the 
MBO CSAP database is the most comprehensive 
single available information source on seabird 
distribution for the study area. 

Two more recent regional offshore seabird 

survey efforts are underway that will likely 

eventually surpass the MBO CSAP in spatial and 

temporal intensity of effort: seabird observations 

conducted on NOAA Fisheries Service (NMFS/ 

NEFSC) Ecosystem Monitoring and Herring 

Acoustic Survey cruises regularly from 2006 to 

present (contact: Dr. Richard Veit, Professor, 

City University of New York, College of Staten 

Island, richard.veit@csi.cuny.edu), and AMAPPS 

surveys conducted starting in 2010 (Contact: Mr. 

Michael Simpkins, Branch Chief, NOAA/NEFSC/READ/Protected Species Branch, michael.simpkins@noaa. 

gov). It is recommended that New York State closely monitor the progress of these efforts and use these data 

to assess possible changes in seabird distribution and abundance since the period in the 1980's when the 

MBO CSAP was collected. 

As part of an ongoing study funded by the U.S. DOI Bureau of Ocean Energy Management (BOEM), the USGS 
Patuxent Wildlife Research Center (PWRC) has been assembling a compendium of offshore avian survey data 
in the U.S.Atlantic (O'Connell et al., 2009; Spiegel and Johnston, 2011; see also http://www.boemre.gov/eppd/ 
PDF/EPPDStudies/CompendiumAvianlnformation.pdf). The Atlantic Seabird Compendium (ASC) currently 
includes >250,000 seabird occurrence records from >80 datasets, including all of the datasets identified above. 
USGS plans to make most of this database publicly available in the near future through the US Fish & Wildlife 
Service (USFWS), although there may be some limitations on distribution due to the proprietary nature of some 
datasets. Metadata and summary data are already available (Spiegel and Johnston, 2011; USGS, 2012). 

Other resources for seabird data include online databases such as the Ocean Biogeographic Information 
System Spatial Ecological Analysis of Megavertebrate Populations (OBIS-SEAMAP; http://seamap.env.duke. 
edu/) and the Global Biodiversity Information Facility (GBIF; resource: http://www.gbif.org/). In the preparation 
of this report we have extensively searched all of these resources. Again, at the time of publication, despite its 
age, the MBO CSAP represents the most comprehensive, long-term single database available to characterize 
the marine avifauna of the NY region. 

Payne et al. (1 984) used an early version of the MBO CSAP database to analyze the distribution and abundance 
of seabirds in 13 subregions from Nova Scotia to Cape Hatteras. Their report provided seasonal density 
estimates of seabirds at coarse spatial scales (e.g., Southern New England Inner shelf, Mid-Atlantic Outer 
Shelf), which may be useful for regional ocean planning. Hoopes et al. (1993) presented summary analyses 
based on the full 1980-1988 MBO CSAP dataset. More recently Huettmann and Diamond (2000, 2001), and 
Pittman and Huettmann (2006) used the MBO CSAP dataset and additional sighting information collected in 
Canadian waters (the Programme Integre de Recherches sur les Oiseaux Pelagiques [PIROP] database) to 
assess seabird distribution patterns at medium resolution (5 arc-minutes or ~9 km). The reader is referred 
to Huettmann (2000) and Pittman and Huettmann (2006) for a detailed characterization of the distribution of 
survey effort and other background information on the PIROP and MBO CSAP databases. Huettmann and 
Diamond (2000, 2001) tracked migration patterns of nine species in the northwest Atlantic Ocean, including 
some parts of the New York study area. 

Pittman and Huettmann (2006) derived spatially-explicit and quantitative information on the distribution and 
diversity of seabirds within the Gulf of Maine, as part of an ecological assessment for the Stellwagen Bank 




Image 6.6. South Polar Skua. 
Photo by: David Pereksta, BOEM 



National Marine Sanctuary. They used a boosted 
regression tree (BRT) approach to develop 
statistical models and produce medium resolution 
(-9x9 km grid) predictive maps of seabird relative 
prevalence. In the present chapter, we use a 
generalized linear model (GLM)-based regression 
approach instead of the BRT approach, use 
a wider set of environmental covariates, use 
relative abundance information as well as relative 
prevalence, and employ a different statistical 
framework (regression-kriging) that allows us to 
produce finer spatial resolution predictions by 
accounting for spatial autocorrelation. 

It should be noted that a smaller project by the New 

York State Energy Research and Development 

Authority (NYSERDA) used the MBO dataset to 

assess potential avian impacts of a proposed Long Island - New York City offshore wind project (NYSERDA, 

2010). Their report provides a list of species and a review of their biology and potential impacts, but does not 

map or assess distribution. 

We also note that a variety of datasets other than the MBO CSAP exist for coastal areas near the study 
area, including: surveys of Long Island's beach-nesting shorebird habitats (NY and TNC, 1991), New York's 
Christmas Bird Count, the North American Breeding Bird Survey, The Audubon Society's eBird counts, and 
compilations of surveys by the US Fish and Wildlife Service. The Nature Conservancy used New York State 
and US Fish and Wildlife surveys to identify critical habitats for Roseate Terns, Least Terns, Piping Plovers 
and Harlequin Ducks along Long Island's shores. These studies may be useful for nearshore spatial planning, 
although it is important to realize they do not provide information on the presence or absence of birds outside 
of nearshore areas. 

Rhode Island and New Jersey, two states adjacent to New York, have conducted systematic studies of their 
offshore waters to establish ecological baselines for use in coastal and marine spatial planning (Paton et 
al., 2010; GMI, 2010). These studies are recent, reliable and extend over offshore habitats adjacent to their 
corresponding States, some of which partially overlap but do not cover the New York study area. 

Finally, we are aware of three additional datasets that overlap very small portions of the study area. The 
Massachusetts Audobon Society conducted surveys from 2002 to 2006 to assess the potential effect of wind 
farm development on avifauna in Nantucket sound. Surveys were conducted from fixed-wing aircraft at 500 
feet above the water surface and by boat. These data are proprietary but are contained in the USGS ASC 
database mentioned above (O'Connell et al., 2009). The Minerals Management Service (now the Bureau 
of Ocean Energy Management) funded a winter survey of coastal mid-Atlantic seabirds that extended from 
southern Virginia to the northern border of New Jersey. This survey covered from the coast offshore to at least 
12 nautical miles, and was conducted in 2001-2002 and 2002-2003. These data are public and included in 
the USGS ASC database (O'Connell et al., 2009). As part of studies related to the Cape Wind project, Energy 
Management, Inc. commissioned boat and aerial surveys of Nantucket Sound from 2002 to 2004. These data 
are proprietary and have not been publicly released, but metadata are available in the USGS ASC database 
(O'Connell etal., 2009). 



6.8. METHODS 

This section provides 
an overview of how the 
MBO CSAP seabird 
database (Figure 6.1) was 
used, in conjunction with 
environmental covariates, to 
fit spatial statistical models 
and produce predictive maps 
of seabird distribution and 
relative abundance in the 
NY region representative of 
the period from 1980-1988. 
Figures 6.2 and 6.3 provide 
an outline and example of 
the key processing steps. 
Tables 6.2, 6.4 and 6.5 
and Figures 6.1 and 6.4 
summarize the input data. 
The statistical methods 
used are covered in greater 
detail in Appendix 6.A and 
Online Supplement 6.2. The 
sources and processing of 
the environmental covariate 
data are covered in greater 
detail in Chapter 4, Appendix 
6.B and Online Supplement 
6.1. 
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Figure 6. 1. Locations (centroids) of all unique MBO CSAP seabird survey transects sampled 
between April 1980 and October 1988 in a) Spring, b)Summer, c)Fall, and d) Winter. 



6.8.1. Study region and grid 

The same study region and 30 arc-second geographic grid described in Chapter 4 of this report was used for 
all seabird environmental variables and model predictions. The 30 arc-second grid has a north-south linear 
dimension of 0.927 km and an average east-west linear dimension of 0.714 km in the study region (average 
= 0.814 km). This resolution is approximately 400 times finer than previous 10 arc-minute maps and 100 
times finer than 5 arc-minute maps. For simplicity, we have chosen to use decimal degrees to keep track of 
grid cell centroids, and measure distances using a simple elliptical geodetic approximation; the effects of this 
simplifying assumption are negligible given the size of our study region and grid configuration (potential errors 
in linear distances <50% of grid cell horizontal resolution). 

6.8.2. Seabird survey data 

Seabird sightings data for the study region were extracted from the Manomet Bird Observatory's (MBO) 
Cetacean and Seabird Assessment Program (CSAP) database. The former MBO is currently named the 
Manomet Center for Conservation Sciences (MCCS). This is one of the largest pelagic seabird data sets in 
the world, providing exceptional spatial and temporal resolution for the Northwest Atlantic for the period from 
1980-1988. As discussed above (Section 6.7.), this is the best currently available single source for seabird 
abundance and distribution data in the study region; other, smaller and/or more narrowly focused datasets 
exist, and future efforts to synthesize those data could result in improved coverage and accuracy in specific 
areas and/or for certain species. The MBO database was provided directly by staff at the MCCS to the NOAA 
Biogeography Branch in 2006 (Pittman and Huettmann, 2006). We extracted only quantitative survey data 
(i.e., data from fixed time, standardized surveys as described below) from the CSAP database. The following 
filtering criteria were used: OBTYPE=1 (quantitative surveys only), ANTYPE=1 (seabirds) or (confirmed 
absence of any seabird species), and transect centroid falling on or within the study area shown in Chapter 1 , 
Figure 1 .2. Field names refer to fields in the digital database outputs supplied to us by MCCS. 
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Figure 6.2. Flowchart of seasonal predictive mapping process for seabirds. Letters represent geospatial inputs/ 
outputs. Numbers represent process steps. For details, see Figure 6.3, Section 6.8.6. and Appendix 6. A. 
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Figure 6.3. Example geospatial 
information from each step of 
the seasonal predictive modeling 
process (example images are 
from the Dovekie winter model). 
Panel letters correspond to the 
letters in the model flowchart 
(Figure 6.2). 

Notes: 

i) Maps in this figure are intended 
only as examples of the modeling 
process and should not be used 
in place of the final Dovekie maps 
presented in Figure 6.12 and in 
Appendix 6. C. 

ii) For a full depiction of the 
predictor variables shown in 
panel B, see Figure 6.4 and 
Appendix 6.B. 

Hi) Panels D, E, F, and H show 
prediction maps before error- 
masking to eliminate unreliable 
predictions far from data points. 

iv) Panels D, F, and H show 
model predictions in Box-Cox 
transformed units that are not 
linearly related to the original 
SPUE units. 

v) Stage II panels (D, F, and H) 
refer to conditional SPUE (SPUE 
when the species is present). 

vi) Final Stage I x II model results 
(Panel I) are presented in back- 
transformed units (the original 
SPUE units of No. indiv./km 2 /15- 
min), and represent the expected 
value of unconditional SPUE 
(the expected average over 
many repeated measurements 
at the same location if zeros are 
included in the average), which is 
the final output of the model. 

vii) See Appendix 6. A for details 
of statistical methods. 

viii) Pink dashed lines depict 
outline of NY planning area. 





Table 6.2. Species recorded by M BO CSAP quantitative surveys (1980-1988) in the study region, and groupings used for analysis*. 


COMMON NAME 


FAMILY 


SCIENTIFIC NAME 


Species Individually Mapped 


Black-legged Kittiwake 


Laridae 


Rissa tridactyla 


Common Tern 


Sternidae 


Sterna hirundo 


Common Loon 


Gaviidae 


Gavia immer 


Cory's Shearwater 


Procellariidae 


Calonectris diomedea 


Dovekie 


Alcidae 


Alle alle 


Great Black-backed Gull 


Laridae 


Larus marinus 


Great Shearwater 


Procellariidae 


Puffinus gravis 


Herring Gull 


Laridae 


Larus argentatus smithsonianus 


Laughing Gull 


Laridae 


Larus atricilla 


Northern Fulmar 


Procellariidae 


Fulmarus glacialis 


Northern Gannet 


Sulidae 


Morus bassanus 


Pomarine Jaeger 


Stercorariidae 


Stercorarius pomarinus 


Sooty Shearwater 


Procellariidae 


Puffinus griseus 


Wilson's Storm-Petrel 


Hydrobatidae 


Oceanites oceanicus 


Alcids, Less Common 


Alcidae 




Atlantic Puffin 


Alcidae 


Fratercula arctica 


Common Murre 


Alcidae 


Uria aalge 


Thick-billed Murre 


Alcidae 


Uria lorn via 


Razorbill 


Alcidae 


Alca torda 


Unidentified sightings in the family Alcidae 


Alcidae 


n/a 


Coastal Waterfowl 

(Elders, Mergansers, Scoters, Ducks, Loons) 


Anatidae, Gaviidae 




White-winged Scoter 


Anatidae 


Melanitta fusca 


Black Scoter 


Anatidae 


Melanitta nigra 


Surf Scoter 


Anatidae 


Melanitta perspicillata 


Long-tailed Duck 


Anatidae 


Clangula hyemalis 


Red-throated Loon 


Gaviidae 


Gavia stellata 


Red-breasted Merganser 


Anatidae 


Mergus serrator 


Common Eider 


Anatidae 


Somateria mollissima 


Unidentified species in families Anatidae and Gaviidae 


Anatidae, Gaviidae 


n/a 




Stercorariidae 




Parasitic Jaeger 


Stercorariidae 


Stercorarius parasiticus 


Long-tailed Jaeger 


Stercorariidae 


Stercorarius longicaudus 


Unidentified sightings of Jaegers 


Stercorariidae 


n/a 


Phalaropes 


Scolopacldae 




Red Phalalarope 


Scolopacidae 


Phalaropus fulicaria 


Red-necked Phalarope 


Scolopacidae 


Phalaropus lobatus 


Unidentified sightings of Phalaropes 


Scolopacidae 


n/a 


Shearwaters, Less Common 


Procellariidae 




Manx Shearwater 


Procellariidae 


Puffinus puffinus 


Audubon's Shearwater 


Procellariidae 


Puffinus Iherminieri 


Unidentified sightings in the family Procellariidae 


Procellariidae 


n/a 





Table 6.2 cont. Species recorded by M BO CSAP quantitative surveys (1980-1988) in the study region 


, and groupings used for analysis*. 


COMMON NAME FAMILY 


SCIENTIFIC NAME 


Small Gulls, Less Common Laridae 


Ring-billed Gull 


Laridae 


Larus delawarensis 


Bonaparte's Gull 


Laridae 


Larus Philadelphia 


Storm-Petrels, Less Common Hydrobatidae 


Leach's Storm-Petrel 


Hydrobatidae 


Oceanodroma leucorhoa 


Band-rumped Storm-Petrel 


Hydrobatidae 


Oceanodroma castro 


White-faced Storm-Petrel 


Hydrobatidae 


Pelagodroma marina 


Unidentified species in the family Hydrobatidae 


Hydrobatidae 


n/a 


Terns, Less Common Sternidae 


Royal Tern 


Sternidae 


Sterna maxima 


Arctic Tern 


Sternidae 


Sterna paradisaea 


Roseate Tern 


Sternidae 


Sterna dougallii 


Least Tern 


Sternidae 


Sterna antillarum 


Sooty Tern 


Sternidae 


Onychoprion fuscatus 


Bridled Tern 


Sternidae 


Sterna anaethetus 


Forster's Tern 


Sternidae 


Sterna forsteri 


Unidentified sightings in the family Sternidae 


Sternidae 


n/a 


Unidentified Gulls Laridae 


Unidentified sightings in the family Laridae 


Laridae 


n/a 


**Cormorants Phalacrocoracidae 


Double-crested Cormorant 


Phalacrocoracidae 


Phalacrocorax auritus 


Great Cormorant 


Phalacrocoracidae 


Phalacrocorax carbo 


Unidentified sightings of Cormorants 


Phalacrocoracidae 


n/a 




Canada Goose 


Anatidae 


Branta canadensis 


American Black Duck 


Anatidae 


Anas rubripes 


Brant 


Anatidae 


Branta bemicla 


Bufflehead 


Anatidae 


Bucephala albeola 


Mallard 


Anatidae 


Anas platyrhynchos 


Glaucous Gull 


Laridae 


Larus hyperboreus 


Lesser Black-backed Gull 


Laridae 


Larus fuscus 


Iceland Gull 


Laridae 


Larus glaucoides glaucoides 


Little Gull 


Laridae 


Larus minutus 


South Polar Skua 


Stercorariidae 


Stercorarius maccormicki 


Unidentified sightings of Ducks and Geese 


Anatidae 


n/a 


**Skuas, Less Common Stercorariidae 


Great Skua 


Stercorariidae 


Stercorarius skua 


Unidentified sightings of Skuas 


Stercorariidae 


n/a 


* Species with one or more sightings in standardized quantitative surveys by the Manomet Bird Observatory Cetacean and Seabird Assessment 

Program (MBO CSAP) during the period April 1980 to October 1988. 
** No predictive modeling was carried out for these groups due to limited sample size. 




Survey methods have been previously described 
(Powers et al., 1980; Powers, 1983; Payne et al., 
1984; Huettmann, 2000). Briefly, a small number 
of expert observers were trained in standardized 
survey methods and placed on research vessels 
undertaking a wide variety of surveys, including 
NOAA Fisheries Service groundfish, scallop, and 
plankton surveys, US Coast Guard surveys, and 
US EPA surveys. Observers conducted surveys in 



Table 6.3. Definition of seasons. 



SEASON ABBREVIATION* 



START 
DATE 



END DATE 



Spring 
Summer 



SP 
SU 



March 1 
June 1 



May 31 
August 31 



Fall 



FA 



September 1 November 30 



Winter 



Wl 



December 1 February 28* 



*Abbreviations used in some labels and figures. 
**February 29 in leap years. 



Table 6.4. Summary of numbers of identifiable species, unidentified types, and contributions to species richness of each mapped 
species and group. 



# OF POSITIVELY # OF MINIMUM MAXIMUM 

Q __ r| _ Q _____.._ MAM _ IDENTIFIED UNIDENTIFIED CONTRIBUTION CONTRIBUTION 
SPECIES OR GROUP NAME SpEC|ES |pj CATEG0R|ES |N T0 SPEC|ES T0 SPEC|ES 

GROUP GROUP RICHNESS RICHNESS 


Individually mapped species 


Black-legged Kittiwake 











Common Loon 











Common Tern 











Cory's Shearwater 











Dovekie 











Great Black-backed Gull 











Great Shearwater 











Herring Gull 











Laughing Gull 











Northern Fulmar 











Northern Gannet 











Pomarine Jaeger 











Sooty Shearwater 











Wilson's Storm-Petrel 











Subtotals 


14 





14 


14 


Modeled species groups 


Alcids, less common 


4 


4 




4 


Coastal Waterfowl 


7 


3 




7 


Jaegers 


2 


1 




2 


Phalaropes 


2 


1 




2 


Shearwaters, less common 


2 


2 




2 


Small Gulls, less common 


2 







2 


Storm-Petrels, less common 


3 


1 




3 


Terns, less common 


7 


2 




7 


Unidentified gulls 





2 








Subtotals 


29 


16 


8 


29 


Non-modeled species groups \ 


Cormorants 


2 


1 


1 


2 


Rare Visitors 


10 


2 


1 


10 


Skuas, less common 


1 


1 


1 


1 


Subtotals 


13 


4 


3 


13 


GRAND TOTALS 


Modeled 


43 


16 


22 


43 


Not modeled 


13 


4 


3 


13 


All 


56 


20 


25 


56 



Table 6.5. Numbers of unique shipboard survey locations 


in which each species or species group was seen, overall and by season. 


SPECIES OR GROUP NAME TOTAL N N SPRING N SUMMER N FALL N WINTER 


Individually mapped species 


Black-legged Kittiwake 


1,391 


260 


2 


469 


660 


Common Loon 


217 


112 


4 


60 


41 


Common Tern 


171 


57 


80 


33 


1 


Cory's Shearwater 


458 


3 


301 


153 


1 


Dovekie 


161 


37 





27 


97 


Great Black-backed Gull 


2,172 


788 


176 


587 


621 


Great Shearwater 


951 


33 


502 


407 


9 


Herring Gull 


4,252 


1,671 


282 


1,565 


734 


Laughing Gull 


404 


47 


115 


236 


6 


Northern Fulmar 


392 


228 


43 


45 


76 


Northern Gannet 


2,302 


1,142 


9 


537 


614 


Pomarine Jaeger 


130 


14 


7 


108 


1 


Sooty Shearwater 


205 


88 


114 


3 





Wilson's Storm-Petrel 


1,680 


300 


1172 


207 


1 


Species groups 


Alcids, less common 


147 


80 





5 


62 


Coastal Waterfowl 


300 


120 





67 


113 


Jaegers 


79 


13 


8 


58 





Phalaropes 


294 


247 


7 


36 


4 


Shearwaters, less common 


196 


16 


93 


87 





Small Gulls, less common 


210 


53 


3 


110 


44 


Storm-Petrels, less common 


225 


46 


126 


53 





Terns, less common 


127 


57 


49 


21 





Unidentified gulls 


291 


55 


19 


163 


54 


Cormorants 


66 


13 


9 


21 


23 


Rare Visitors 


42 


18 


1 


11 


12 


Skuas, less common 


36 


12 


2 


11 


11 














Special category 


No birds sighted 


2,299 


511 


847 


812 


129 


Number of unique locations 


9,148 


2,549 


2,674 


2,777 


1,148 



15-minute periods, where each period was considered an individual transect. In a small number of instances 
(313 data points, less than 1.5% of the data records) the actual survey time was slightly less than 15-minutes 
(7 to 14-minutes); this deviation from the standard protocol was considered minor and no explicit correction 
was made (it is partially accounted for simply by the corresponding decrease in transect area -- for a given 
ship speed, a shorter time transect will be shorter in length and area). Seabirds were identified to the lowest 
possible taxonomic level, usually species, and counted within a fixed strip width of 300 m at one side of the ship, 
traveling on a straight course, at a constant speed (generally 8-12 knots). The starting point, constant bearing, 
and constant speed during each fixed 15-minute survey period were recorded using the ship's instruments, 
and used to define the area of the rectangular strip covered by the survey. For purposes of this analysis, unless 
otherwise noted, the centroid of each rectangular strip was used to define its spatial location. 

In the NY study region, transect length averaged 4.3 km (SD=0.8 km), ranging from 0.4 to 6.5 km, and transect 
area averaged 4.4 km 2 (SD=2.9 km 2 ), ranging from 0.1 to 20 km 2 . Average nearest-neighbor distance between 
transect centroids ranged from 2.8 km to 3.6 km depending on season, and the minimum nearest-neighbor 
distances ranging from 0.03 km to 0.26 km, depending on season (Figure 6.1). Given this spatial distribution 
of observations, the minimum length scale of features that can be resolved in all seasons is approximately 0.5 




Table 6. 6. Summary of seasons chosen for predictive modeling. 





SEASON 


NUMBER OF 


SPECIES OR GROUP 


SPRING SUMMER FALL WINTER 


SEASONS 
MODELED 


Individually mapped species 


Black-legged Kittiwake 


Modeled 


Not modeled 


Modeled 


Modeled 


3 


Common Loon 


Modeled 


Not modeled 


Modeled 


Modeled 


3 


Common Tern 


Modeled 


Modeled 


Modeled 


Not modeled 


3 


Cory's Shearwater 


Not modeled 


Modeled 


Modeled 


Not modeled 


2 


Dovekie 


Combined* 


Absent 


Combined* 


Modeled 


1 


Great Black-backed Gull 


Modeled 


Modeled 


Modeled 


Modeled 


4 


Great Shearwater 


Not modeled 


Modeled 


Modeled 


Not modeled 


2 


Herring Gull 


Modeled 


Modeled 


Modeled 


Modeled 


4 


Laughing Gull 


Modeled 


Modeled 


Modeled 


Not modeled 


3 


Northern Fulmar 


Modeled 


Modeled 


Modeled 


Modeled 


4 


Northern Gannet 


Modeled 


Not modeled 


Modeled 


Modeled 


3 


Pomarine Jaeger 


Not modeled 


Not modeled 


Modeled 


Not modeled 


1 


Sooty Shearwater 


Modeled 


Modeled 


Not modeled 


Absent 


2 


Wilson's Storm-Petrel 


Modeled 


Modeled 


Modeled 


Not modeled 


3 




Subotal 


38 


Species groups 


Alcids, less common 


Modeled 


Absent 


Not modeled 


Modeled 


2 


Coastal Waterfowl 


Modeled 


Absent 


Modeled 


Modeled 


3 


Jaegers 


Not modeled 


Not modeled 


Modeled 


Absent 


1 


Phalaropes 


Modeled 


Not modeled 


Not modeled 


Not modeled 


1 


Shearwaters, less common 


Not modeled 


Modeled 


Modeled 


Absent 


2 


Small Gulls, less common 


Modeled 


Not modeled 


Modeled 


Modeled 


3 


Storm-Petrels, less common 


Modeled 


Modeled 


Modeled 


Absent* 


3 


Terns, less common 


Modeled 


Modeled 


Not modeled 


Absent 


2 


Unidentified gulls 


Modeled 


Not modeled 


Modeled 


Modeled 


3 


Cormorants 


Not modeled 


Not modeled 


Not modeled 


Not modeled 





Rare Visitors 


Not modeled 


Not modeled 


Not modeled 


Not modeled 





Skuas, less common 


Not modeled 


Not modeled 


Not modeled 


Not modeled 







Subtotal 


20 


Special category 


No birds sighted** Modeled Modeled Modeled 


Modeled 


4 




Subtotal 


4 


Total number of seasonal predictive models 62 


* Sightings from these seasons were combined with sightings from Winter for modeling. 

f Species or group not detected in the Manomet dataset in this season. 

**Surveys that specifically noted the absence of any seabirds were modeled separately. 



Bottom depth (BATH 




Distance from shore 




Signe^dist^romshelf (SSDIST) Mean sediment grain size (PHIM) 




Water-column stratification (STRT) Sea surface temperature (SST) Surface turbidity measure (TUR) 





Surface chlorophyll-a cone. (CHL) Zooplankton biomass (ZOO) 



km (this is determined by the greater of: the minimum transect length, or twice the minimum nearest-neighbor 
distance). Thus the grid resolution chosen for the present study (-0.8 km) approaches the finest possible 
resolution given the limits of the data. This is one reason that cross-validation, described below, is essential 
to characterize the accuracy of final mapped predictions. Cross-validation accuracy assessments take into 
account horizontal positional error as well as other sources of uncertainty. 

The CSAP database contained 16,899 species sighting and abundance records (plus 2,299 records of surveys 
in which no seabirds were sighted, i.e., confirmed absences) from quantitative transect surveys at 9,099 unique 
locations in our study area, spanning the time period from April 20, 1980 to October 3, 1988 (a total of 9,148 
unique survey locations in all seasons; Figure 6.1, Tables 6.2, 6.4, 6.5). The age of this dataset is an admitted 




Figure 6.4. Potential environmental 
predictor variables considered for each 
predictive model. For dynamic variables 
(STRT, SST, TUR, CHL, ZOO) only the 
Winter map is shown. See Appendix 6.B 
for more details on predictor variables, 
including all four seasonal maps for 
each of the dynamic variables and maps 
of the variables after transformation for 
statistical analysis. Full legends are given 
in Appendix 6.B. 




limitation of this analysis, and implications of using data that are 24-32 years old at the time of the present 
report's publication (2012) are discussed in Appendix 6.A (Section 6.A.14.) and in the Discussion (Section 
6.10.). Fifty-six identifiable species of seabirds and waterfowl were recorded (Table 6.2, Table 6.4). These 
species were organized into groups for modeling purposes where necessary due to small sample sizes (see 
below and Table 6.2, Table 6.4, Table 6.5, Table 6.6). All "unidentified" bird sighting categories were identified 
at least to family level, which allowed assignment of those sighting records to appropriate groups. Most of the 
"unidentified" categories are likely to represent species positively identified elsewhere in the database that 
could not be positively identified in a particular sighting record, rather than species not otherwise represented 
in the database (although it is possible that a few of the "unidentified" records truly represent species not 
recorded elsewhere). Fifty-six is thus a conservative estimate of the number of seabird and opportunistically 
seafaring species present in the 1 980-1 988 period in the NY Bight study region. Since the MBO CSAP surveys 
were ship-based and most survey effort was focused >10 km offshore (Pittmann and Huettmann, 2006), the 
total number of species that may be observed over nearshore waters could be substantially higher. 

Temporal patterns of seabird occurrence were summarized with monthly histograms (Appendix 6.C), and 
used to further divide bird occurrence into seasons for modeling as described in Section 6.8.3. Although the 
Manomet dataset provides good coverage for offshore environments, survey effort is reduced within 10 km of 
shore (Pittmann and Huettmann, 2006), and excludes most of Long Island Sound. To alleviate the effects of this 
bias on temporal occurrence histograms for nearshore species (species spending substantial amounts of time 
<10 km from shore), we obtained non-quantitative data from an online dataset (eBird, http://ebird.org/content/ 
ebird). The eBird dataset consists of opportunistic publicly available bird observations made by recreational 
and professional bird watchers and was developed by the Cornell Lab of Ornithology and National Audubon 
Society. Seabird sighting frequencies were extracted from the open access eBird database in November, 
2011 for the New England/Mid Atlantic Bird Conservation Region, and used to produce monthly histograms 
of sighting frequency that are presented alongside the CSAP histograms for comparison (see Appendix 6.C). 



of a given species or species group sighted within a 15-minute survey, scaled to the transect area surveyed 
(SPUE). Note, also, that because detectability was not accounted for and individuals were not tracked over 
time, the indices of occurrence used here are not equivalent to the true frequencies of occurrence and indices 
of abundance are not equivalent to population abundance; they are only relative proxy measures. 

6.8.4. Grouping and selection of species for modeling 

All individually identified species with 40 or more sightings in at least one season (Spring, Summer, Fall, Winter; 
defined in Section 6.8.3.) were selected for predictive modeling (Table 6.5, Table 6.6). Remaining species were 
grouped according to shared ecological and life history patterns and/or shared spatial and temporal patterns 
of occurrence (Table 6.2, Appendix 6.C; Pittmann and Huettmann, 2006), and groups were modeled where 40 
or more sightings occurred in at least one season. Remaining groups had no more than 23 sightings in any 
one season and were considered too rare to model. For each species or group, a season was only modeled 
if at least 40 records existed for that season, with the following exceptions: Common Terns were modeled in 
Fall despite only 33 sighting records, and Dovekie sighting records from Fall (n=27) and Spring (n=37) were 
combined with Winter (n=97) observations (Table 6.6). For groups and/or individual species that were not 
modeled, as well as for seasons in which a given species or group was too rare to model, point maps of all 
occurrences are presented. 

We note that other modeling methods might be capable of generating reliable predictions when fewer than 40 
records are present, but feel the uncertainties involved in extrapolating predictions from so few data make raw 
point maps of occurrence more suitable for these limited data cases, unless other information is available to 
predict distribution of these species based on biological characteristics and environmental preferences. The 
abundance of these birds is most conservatively treated as unknown (not zero) at locations in between sample 
points; the range of observed abundances over the entire study area can be used as a guide to the range of 
possible values at unsampled locations. 



6.8.3. Processing of quantitative seabird data for analysis 

Observations were separated by season; season definitions are given in Table 6.3. For each season, unique 
survey locations were identified (2,549 locations in spring, 2,674 locations in summer, 2,777 locations in fall, 
1,148 locations in winter), defining the sampling configuration for each season (Figure 6.1). For each species 
or group sighting record in each season, the "COUNT" field of the CSAP database (number of birds of that 
species observed during the timed survey) was divided by the corresponding survey tract area to yield an index 
of relative abundance that was standardized by both time (15-minutes) and area (km 2 of transect footprint), 
which we will hereafter refer to as sightings per unit effort (SPUE). Units of SPUE are individual birds detected 
per square kilometer per 15-minute survey. When species were grouped (see below), the "COUNT" field 
was summed over all species in the group occurring in a given survey. Because observers were searching 
for all seabird species during surveys conducted at each unique location, the absence of a species record at 
one of these points generated an SPUE of for that species at that location in that season. Note that these 
calculations assume perfect detectability; the implications of this assumption are discussed in Appendix 6.A 
(Section 6.A.14.). It was very uncommon for surveys to be centered at precisely the same location in the same 
season more than once over the 9 survey years (this occurred 6 times in spring, 2 in summer, 12 in fall, and 
2 in winter). Where this did occur, we used the weighted average SPUE for each species at that location, with 
weights proportional to survey tract areas (surveys with greater tract area received greater weight). 

This process resulted in four sets (one for each season) of georeferenced point measurements of SPUE, 
representing all seabird sightings that occurred in each season over the 9-year survey period. These datasets 
were coded in two ways for further analysis: 

(1) as binary variables indicating the occurrence (presence [1] or absence [0]) of a given species or group 
(Figure 6.2A: Stage I), and, 

(2) as non-zero continuous variables representing the relative index of abundance (SPUE) of each observed 
species at each location where it was present (Figure 6.2B: Stage II). 

Note the distinction between measures of occurrence, which refer to the frequency with which a species was 
sighted among independent 15-minute surveys, and abundance, which refers to the number of individuals 



6.8.5. Potential environmental predictors 

Based on available high-resolution data coverage within our study region and previous studies of environmental 
correlates of seabird distribution and abundance, we identified 11 potential environmental predictor variables 
(Figure 6.4, Chapter 4, Appendix 6.B, Online Supplement 6.1). Short codes used to refer to each predictor 
are given in parentheses: bottom depth (BATH), bottom slope (SLOPE), bottom slope-of-the-slope (SLPSLP), 
mean grain size of bottom surficial sediments (PHIM), linear distance from shore (DIST), signed linear distance 
from the shelf edge (SSDIST), sea surface temperature (SST), water column stratification (STRT), sea surface 
chlorophyll concentration (CHL), sea surface turbidity (TUR), and near surface zooplankton biomass (ZOO). 
It is important to note that this was only the candidate predictor set; a model selection process described 
below (Section 6.8.7.) narrowed down the set of predictor variables that contributed to any particular species/ 
group model in any particular season. We also wish to emphasize this is not a comprehensive list of all 
potential environmental influences on seabird abundance in the area, but instead is limited by such factors as 
the availability of high-resolution datasets collected over sufficiently long time frames to allow calculation of 
climatologies. 

The 11 identified predictors (Figure 6.4) were derived from 6 types of data: bathymetry and coastline, surficial 
sediments, water-column sampling, satellite ocean surface temperature, satellite ocean surface color, and 
ship-based plankton tows (Chapter 4). These datasets are described individually in Chapter 4, Appendix 6.B, 
and Online Supplement 6.1. For time-varying predictors (SST, STRT, CHL, TUR, ZOO), long-term average 
(climatological) ocean conditions were mapped by season (defined as in Table 6.3). Due to constraints of 
data availability, the climatological period of the environmental variables does not always match that of the 
seabird survey data. The implicit assumption is that the long-term climatological patterns of ocean conditions 
are reflective of the 1 980-1 988 period. Implications of this assumption are discussed in Appendix 6.A (Section 
6.A.14.). 

All grid processing was carried out using ArcGIS 9.3.1 with the Spatial Analyst extension (Environmental 
Systems Research Group [ESRI], Redlands, CA), Geostatistical Analyst extension (ESRI), XTools Pro 6.2.1 
for ArcGIS 9.x (Data East LLC, Novosibirsk, Russia), and Hawth's Tools for ArcGIS 9.x (Beyer, 2004). All 



predictor grids were co-registered on the 30 arc-second sampling grid (Chapter 4) and clipped to the spatial 
extent of the study area (Chapter 1, Figure 1.2). Grids were exported from ArcGIS in the .FLT binary floating 
point raster format for subsequent processing. All environmental predictors used in this study and associated 
metadata are available by contacting the corresponding author. 

Some predictors were gridded or re-sampled using interpolation algorithms (e.g., kriging, bilinear interpolation). 
Uncertainty in predictors resulting from interpolation was ignored for purposes of model formulation; that is, 
predictor values at each location were treated as perfectly known. This is likely to cause parametric estimates 
of uncertainty to underestimate prediction uncertainty when predictions are made at locations not included 
in model fitting. We address this uncertainty with the model evaluation, cross-validation, and uncertainty 
calibration procedures described in Appendix 6.A (Section 6.A.12.). Chapter 4, Appendix 6.B, and Online 
Supplement 6.1 also provide estimates of the relative uncertainty of each predictor layer as it is discussed. 

6.8.6. Seasonal predictive modeling 

A flowchart of the seasonal predictive modeling process is shown in Figure 6.2. In this figure, capital letters 
indicate the geospatial data and statistics that form the inputs and outputs of the modeling process. Numbers 
represent steps of the modeling process that transform geospatial information from step to step. Appendix 6.A, 
Statistical Methods, gives a detailed explanation of each process step and relevant equations and references. 

For each season with sufficient data within each species/group selected for predictive modeling, we model the 
transect estimates of SPUE as point samples (located at the centroid of each transect) of two spatial random 
processes, Stage I and Stage II (Figure 6.2A). Stage I uses binary (presence/absence) data from the MBO 
CSAP surveys (Figure 6.2A, left). Stage II uses relative abundance (SPUE) observations for each species or 
group from the same surveys, but does not consider locations where SPUE=0 (i.e., Stage II only considers 
observations where the species is present) (Figure 6.2A, right). This two-stage modeling approach has been 
successfully applied to model marine species distributions (Stefansson, 1996; Ver Hoef and Jansen, 2007; 
Winter et al., 2011) and performs well in tests of alternative species distribution models (Potts and Elith, 2006). 

Within each stage of the model, we use a regression-kriging (RK) model framework to account for both seabird- 
environment relationships and spatial structure (Hengl et al., 2007). Both Stage I and Stage II models include 
two components: a trend model that uses a generalized linear model (GLM) (Figure 6.2: boxes C, D and steps 
1, 2) and incorporates environmental predictors (Figure 6.2B), and a geostatistical model that accounts for 
spatial autocorrelation in the residuals from the GLM (Figure 6.2: boxes E, F and steps 3, 4). Statistically, this 
involves an assumption that the spatial processes are separable into trend and residual components. That is, 
at each location the observed value can be modeled as a sum of a deterministic linear combination of known 
predictor values (trend), and a realization of a spatial random field (residual). For more detailed discussion and 
mathematical treatment of RK models, see Hengl et al. (2007). Implications of the assumption of separability 
are discussed in Appendix 6.A (Section 6.A.14.). 

The trend (Figure 6.2C, D) and residual (Figure 6.2E, F) components of the Stage I and Stage II models are 
combined probabilistically (Stage I: Figure 6.2 step 5) or additively (Stage II: Figure 6.2 step 6) to yield the final, 
combined Stage I (Figure 6.2G) and Stage II (Figure 6.2H) models. Stage I and Stage II are then multiplied 
(Figure 6.2 step 7) to produce the final prediction of relative abundance (Figure 6.2I), the expected long-term 
average SPUE in repeated 15-minute standardized surveys randomly scattered in time within the modeled 
season during the 1980-1988 survey period. The multiplication of Stage I and Stage II arises from the fact that 
Stage I is the probability of presence, and Stage II is the abundance when present (Appendix 6.A). 

The final Stage I x II model prediction is then used to calculate a variety of model validation and evaluation 
statistics (Figure 6.2 box J and step 8), which are reported in the diagnostic tables shown later in this document, 
and in Appendix 6.C and Online Supplement 6.2. Due to the number and variety of diagnostic statistics 
calculated, they are not described in detail here. However, the most important diagnostic statistics are derived 
from a procedure called cross-validation (Cressie, 1993; Goovaerts, 1997; Deutsch and Journel, 1998; Hengl 
etal., 2007; Ross, 2007; Fox, 2008). In this procedure, 50% of the data for each species/group in each season 



are selected at random to be used to fit the model (the "training set"). The remaining 50% of data form the 
"validation set" (also called the "holdout set"). The model fit using only the training set is subsequently used to 
predict observations in the validation set, allowing an independent assessment of the accuracy and predictive 
performance of the model when confronted with new data. Cross-validation is a powerful tool that allows 
assessment not only of prediction accuracy, but of the degree to which modeled uncertainty values capture the 
true uncertainty encountered in out-of-training-set prediction. 

Although models were fit using only 50% of data (the training set), the final maps presented in this document 
and in Appendices 6.C and 6.D were produced by applying the final model to the entire dataset (training and 
validation sets combined). Thus the cross-validation diagnostic statistics provide a conservative estimate of 
model performance, as the final maps are based on a dataset which is twice the size of the training and 
validation subsets. Online Supplement 6.2 provides detailed diagnostic reports on each step of the model 
fitting process, and shows the maps produced using the restricted 50% training subset. 

Figure 6.3 provides an example of each of the lettered inputs and outputs in the Figure 6.2 flowchart, using 
the example of the Dovekie (Alle alle) winter seasonal model. Figure 6.4 shows the potential environmental 
predictor set, with only the winter panel shown for dynamic variables (SST, STRT, CHL, TUR, ZOO). Note 
that the predictor variables shown in Figure 6.4 are untransformed, whereas transformed versions of some 
variables were used for statistical modeling. Transformed versions, and all four seasonal panels for dynamic 
variables, can be found in Appendix 6.B. 

Unless otherwise noted, all predictive modeling analyses were carried out in Matlab R201 1 a (version 7. 1 2.0.635; 
The Mathworks, Natick, MA), with the Statistics, Mapping, and Image Processing toolboxes (Mathworks), 
mGstat (Hansen, 2009, http://mgstat.sourceforge.net/), ROC (Cardillo, 2008), partest (Cardillo, 2008), lowess 
(Burkey, 2009), ploterr (Zorgiebel, 2008), boxcoxlm (Dror, 2006), and additional custom code available by 
contacting the authors. Geostatistical algorithms (kriging, generalized least squares estimation of trend model 
coefficients, variogram estimation, and variogram model fitting) were implemented by calling the program 
gstat (standalone version 2.5.1; Pebesma and Wesseling, 1998; http://www.gstat.org/) from within Matlab, 
with the help of the mGstat toolbox. GLM model selection was carried out by calling the R package glmulti 
(Calcagnoand Mazancourt, 2010; Calcagno, 2011; [http://cran.r-project.org/web/packages/glmulti/index.html]) 
from within Matlab. All Matlab code is available from the corresponding author on request. 

6.8.7. Generation of annual maps 

The seasonal modeling process described in Figure 6.2, Section 6.8.6., and Appendix 6.A was repeated for 
each species and species group, in each season for which sufficient data existed to estimate the model (Table 
6.6). Seasonal predictions were then summed to produce annual climatological maps of SPUE for each species 
and species group (Appendix 6.A., Section 6.A.13.). Seasonal Stage I predictions (presence probability maps) 
were also combined probabilistically to produce "integrated annual presence probability" maps, which reflect 
the probability of each grid cell being occupied in at least one season of the year, assuming independence of 
seasonal predictions (Appendix 6.A., Section 6.A.13.). 

Relative uncertainty of the annual SPUE and annual integrated presence probability maps was calculated as 
the weighted average of the seasonal Stage I and Stage Ixll relative uncertainty estimates, respectively, with 
weights proportional to the overall frequency of species occurrence in that season. In this way, the annual 
relative uncertainty maps were weighted to reflect periods when the species was most prevalent. Mathematical 
details of the relative uncertainty calculations are given in Appendix 6.A (Section 6.A.11.). 

"High", "Medium", and "Low" certainty classes (corresponding to low, medium, and high uncertainty, respectively) 
were assigned based on the relative uncertainty value (Section 6.A.11.). The High certainty class was defined 
by relative uncertainty values <0.50, the Medium certainty class was defined by relative uncertainty values >0.5 
but <0.65, and the Low certainty class was defined by relative uncertainty values >0.65. These certainty classes 
were based on inspection of the cross-validation error vs. relative uncertainty calibration plots presented as 
part of each species/group profile in Appendix 6.C. 



6.8.8. Hotspot analysis 

To examine "hotspots" (or "coldspots") of particularly high (or low) seabird abundance, species richness, and 
diversity (a combined measure of species richness and evenness), we combined results from the individually 
mapped species and species groups. We combined maps in two ways: 

1 . Using only species that were individually mapped (see Table 6.2). 

2. Using both species that were individually mapped and species groups (Table 6.2). 

Results of (1) and (2) were qualitatively similar and so we show only the second set of results, that is, maps 
derived from a combination of individually mapped species and species groups. Both seasonal and annual 
species/group maps were combined to produce seasonal and annual hotspot maps. For additional information 
on the rationale behind hotspot analyses, see Box 6.3. 

6.8.8.1. Abundance 

Abundance hotspots are defined as concentrations of large numbers of individual seabirds, regardless of 
species. To calculate annual abundance hotspot maps, predicted SPUE annual climatology maps were 
summed, with the sum taken at each grid cell location over all species and groups for which predictions were 
available. Seasonal maps were calculated in a similar way, and are presented in Appendix 6.D. 

6.8.8.2. Species richness 

To calculate species richness hotspot maps, we chose an arbitrary threshold to define a species as functionally 
present; if its predicted abundance in the final annual SPUE climatology map was > 7~, where T= the 10th 
percentile of the observed relative abundance (SPUE) of the species when present, divided by 10. After 
applying this threshold to each species map (i.e., values < threshold were set to and values > threshold were 
set to 1), the resulting binary presence/absence maps were summed to produce the annual species richness 
hotspot map. Seasonal maps were produced in a similar way using the seasonal SPUE climatologies (Stage 
Ixll maps). 

Since groups contain more than one species, they could potentially contribute more than 1 unit to species 
richness. An upper bound on group contribution to richness was calculated by repeating the summation after 
multiplying each binary group presence/absence map by the number of true species in the group (not including 
any unidentified types). The 'unidentified gull' group was excluded from all species richness analyses because 
it had no identifiable species. 

Final richness maps show the midpoint between the lower bound on species richness, assuming each group 
contributed just one species, and the upper bound, assuming all species in a group were present. These maps 
likely over-represent actual species richness observable at any given time and location, since not all species in 
a group are equally common, and since even common and abundant species are highly mobile and will not be 
observed in every survey even at the locations of their highest predicted abundance. They are only intended 
as a relative index of potential long-term species richness at each location (the number of species likely to be 
encountered over long periods of time, such as the ~9-year MBO CSAP study period). 

6.8.8.3. Diversity 

Species richness alone can over-represent the effective diversity of an area if many species are present only 
at very low abundances relative to their abundance elsewhere. We found this to be the case for our maps of 
species richness, and so we also report the common Shannon diversity index (FT) (Krebs, 1989). Because 
this index requires the total number of species to remain the same to allow comparison from one location 
to another, species that were not present at a given location were assigned a very low abundance (SPUE = 
0.0001), chosen to be at least 10 times lower than any observed SPUE in the dataset. For this analysis, each 
group was counted as a single species, regardless of the number of species in the group. Unidentified gulls 
were included and counted as a single species. 

Like species richness, the diversity index hotspot maps will likely overestimate the diversity observed at any 
instant in time. Instead, they are intended to represent the long-term potential diversity of an area, based on 
model predictions. 



6.8.8.4. Hotspot uncertainty 

As a general guide to the level of certainty about each of the hotspot maps (abundance, richness, and diversity), 
we calculated a weighted average of the relative uncertainty maps associated with each annual and seasonal 
predicted SPUE climatology map. Weights used were proportional to the frequency of the species in the 
corresponding season or year (similar to the weighting scheme used for seasonal maps described in Appendix 
6.A [Section 6.A.13.]). This resulted in the final hotspot relative uncertainty maps (see Appendix 6.D). 

"High", "Medium", and "Low" certainty classes (corresponding to low, medium, and high uncertainty, respectively) 
were assigned based on the relative uncertainty value in the same manner as for the other annual maps (see 
Section 6.8.7.). Hotspot certainty classes are presented as overlays on Figures 6.35-6.37. 

6.9. RESULTS 

Seasonal predictive modeling was carried out for 58 season/species combinations and for the 'no birds sighted' 
category in 4 seasons (Table 6.6). In all, 44 species were modeled in the 58 seasonal models. 14 species were 
modeled individually, and 30 species were modeled as part of the 9 modeled species groups. These groups 
also included sightings from 15 categories not identified to species level, but still identifiable to group level (all 
sightings were identified at least to family level). 17 species/types were too rare in the study area to model: 3 
Cormorant species, the South Polar Skua, 1 rare visitor/migrant species, and 4 unidentified types that fell into 
one of these three groups (Table 6.2). 

Overall, diagnostic statistics indicated that most models were successful in describing some aspects of species 
distribution, although model performance varied over space and from species to species. Environmental 
predictors contributed significantly to the predictive ability of most models. Figure 6.5 summarizes the relative 
importance of different environmental predictor variables across the seasonal predictive models. The relative 
importance of different model components (trend model, spatial model, 'white noise' error term) varied from 
Stage I to Stage II and among species/groups, although similarities in model structure were often observed 
across different seasons for a given species/group (Figure 6.6). 'White noise' refers to random variability that 
is not spatially structured and is not predictable based on available environmental variables; therefore, models 
with higher white noise components have less predictive power. It can be thought of as indicating a higher 
degree of expected variability around the mean if the same spot were visited repeatedly, even under identical 
environmental conditions. Model performance also varied, and any application of these models should consider 
the performance metrics most relevant to the application in question. Figure 6.7 summarizes several selected 
seasonal model diagnostic statistics. Table 6.7 summarizes some cross-validation performance diagnostics 
from seasonal predictive models. More detailed information about model performance is given in the species 
summaries that follow (Box 6.1 ) and in Appendix 6.C and Online Supplement 6.2. 

Box 6.1 describes the Annual Predictive Model Summaries that were produced for each species and group. 
Maps of predicted long-term average annual relative abundance (SPUE) for each of the 14 individually modeled 
species are shown in Figures 6.8 to 6.21, and maps for the 9 modeled species groups are shown in Figures 
6.22 to 6.30. Three additional species groups (Cormorants, Skuas, and Rare visitors) had insufficient data to 
model; only point-estimates of SPUE at transect centroids are shown for these non-modeled species (Figures 
6.31-6.33). These are followed by a predictive model of an index of the frequency and extent of 15-minute 
surveys in which no seabirds were detected (Figure 6.34A) and the probability for each grid location of at least 
one 'no birds sighted' survey occurring during one of the four seasonal periods (Figure 6.34B). The index of 'no 
birds sighted' is measured in units of km 2 transect area in which no birds were detected per 15-minute survey. 
Box 6.2 discusses the interpretation and potential utility of the 'no birds sighted' model. Following the species, 
group, and 'no birds sighted' models, hotspot analyses of abundance, richness, and diversity are presented 
(Figures 6.35 to 6.37) and discussed (Box 6.3). Finally, a point map is presented of sightings for species of 
particular concern (Figure 6.38). 

As described in Box 6.1 , all maps are accompanied by a data summary table. Where predictive modeling was 
done, predictor summary tables and diagnostic summary tables are also included. Box 6.1 describes the color 
coding used in these tables to provide an overall assessment of the relative performance of a model. 
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Figure 6.5. Summary of environmental predictor variable importance in the seasonal predictive models. For each species or group modeled (horizontal 
axis) and each potential predictor variable (vertical axis), bubbles are plotted for each season in which the predictor was included in the GLM trend 
model (green=spring, red=summer, black=fall, blue=winter). The diameter of the bubble plotted for each season is proportional to the magnitude of the 
standardized simple slope (sum of main effect slope and all interaction effect slopes in which a given predictor variable occurs), a measure of predictor 
variable influence in the fitted trend model. A) Stage I Trend Model predictor importance. B) Stage II Trend Model predictor importance. For details on 
simple slopes calculations see Appendix 6. A and Online Supplement 6.2. Online Supplement 6.2 also includes bar plots that indicate the sign of each 
predictor's influence, information that is not included here (only absolute magnitudes are indicated by bubble size in this plot). 
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F/gc/re 6. 6. Summary of variance partitioning in model fits among the trend component, spatial component (residual variogram e silT) 
and white noise component (residual variogram 'nugget'), for A) Stage I (Occurrence Model), and, B) Stage II (Abundance-when- 
present Model). For details of variance component calculations, see Appendices 6. A and 6. C and Online Supplement 6. 2. 
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Table 6.7 Summmary of cross-validation diagnostic statistics for annual models* 



Figure 6. 7. Summary of annual cross-validation model diagnostic statistics. A) Stage I - Cross-validation AUC; reference line 
is plotted at the threshold value of 0.5 below which the model has no predictive value. B) Stage II - cross-validation % correctly 
predicted within parametric 1 S.D. confidence intervals; reference line is plotted at the theoretical target value of 68.3%. 





DIAGNOSTIC STATISTICS** 


SPECIES OR GROUP NAME 


RankR %1SD AUC p(AUC) MAPE J?® 1 ' J?. 6 '' * eL 

pv ' MAE RMSE Bias 


Species 


Black-legged Kittiwake 


0.02 


84.0% 


0.47 


0.7011 


145% 


44% 


78% 


14% 


Common Loon 


-0.04 


33.3% 


0.77 


0.0000 


258% 


40% 


67% 


41% 


Common Tern 


0.26 


46.7% 


0.77 


0.0000 


579% 


24% 


34% 


18% 


Cory's Shearwater 


0.20 


78.6% 


0.64 


0.0000 


112% 


22% 


69% 


5% 


Dovekie 


0.20 


62.3% 


0.71 


0.0000 


216% 


5% 


15% 


5% 


Great Black-backed Gull 


0.32 


91.4% 


0.77 


0.0000 


134% 


33% 


46% 


7% 


Great Shearwater 


0.07 


75.6% 


0.65 


0.0000 


221% 


27% 


65% 


-1% 


Herring Gull 


0.13 


82.8% 


0.56 


0.3192 


176% 


43% 


70% 


-11% 


Laughing Gull 


0.33 


76.1% 


0.89 


0.0000 


161% 


15% 


28% 


0% 


Northern Fulmar 


0.21 


53.3% 


0.80 


0.0000 


396% 


60% 


101% 


46% 


Northern Gannet 


0.17 


87.9% 


0.64 


0.0095 


259% 


32% 


48% 


0% 


Pomarine Jaeger 


0.30 


66.7% 


0.64 


0.0012 


590% 


11% 


15% 


9% 


Sooty Shearwater 


0.28 


72.4% 


0.62 


0.0025 


135% 


19% 


27% 


21% 


Storm-Petrels, less common 


0.24 


61.5% 


0.63 


0.0072 


306% 


18% 


25% 


22% 


Wilson's Storm-Petrel 


0.29 


75.2% 


0.68 


0.0000 


396% 


45% 


95% 


-17% 


Mean 


0.20 


69.9% 


0.68 


n/a 


272% 


29% 


52% 


11% 


Groups 


Alcids, less common 


0.22 


76.7% 


0.59 


0.0509 


158% 


23% 


33% 


26% 


Coastal Waterfowl 


0.20 


64.3% 


0.77 


0.0000 


395% 


22% 


66% 


14% 


Jaegers 


-0.16 


65.4% 


0.62 


0.0213 


471% 


13% 


25% 


10% 


Phalaropes 


0.16 


70.6% 


0.76 


0.0000 


908% 


23% 


77% 


-9% 


Shearwaters, less common 


-0.05 


76.5% 


0.51 


0.3915 


156% 


21% 


32% 


26% 


Small gulls, less common 


0.16 


77.8% 


0.72 


0.0011 


131% 


27% 


32% 


34% 


Terns, less common 


0.52 


61.9% 


0.67 


0.0047 


874% 


26% 


39% 


28% 


Unidentified Gulls 


0.14 


56.8% 


0.62 


0.0173 


291% 


21% 


27% 


26% 


Mean 


0.15 


68.7% 


0.66 


n/a 


423% 


22% 


41% 


19% 


















'No birds sighted' 0.13 


70.6% 


0.54 


0.2539 


63% 


297% 


466% 


433% 


*Cross-validation was performed by aggregating data and predictions in 10x10 cell (-9x9 km) bins. This was 
necessary because cross-validation data locations did not match up exactly from season to season. See Ap- 
pendices 6.A and 6.C for details. 
"Diagnostic statistics are explained in Box 6.1, Table B. 




Box 6. 1 . Predictive modeling summary guide. 
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PREDICTIVE MODELING SUMMARY GUIDE 

On the following pages the annual predictive models for each species and species group are summarized in 
a standardized layout like the example above. The elements of this model summary are as follows: 

A. Common name (Scientific name) of the species; or Group name (number of species) 

B. Annual climatological predicted relative abundance (Sightings Per Unit Effort [SPUE], No. indiv./ 
km 2 /1 5-minute survey). Predicted SPUE is indicated by the color gradient, which is logarithmically scaled as 
indicated by the color bar at right. Certainty classes are overlaid on model predictions, as indicated in the 
legend of the first figure in the series (Figure 6.8). The NY planning area and the shelf edge are denoted by 
a magenta dotted line and a solid black line, respectively (see legend in Figure 6.8). 

C. A photo of an individual of the species, or of one representative species in the group. 

D. Data summary table. The input data used to fit and test the model are summarized by season and over 
all seasons. "N obs." indicates the number of independent surveys in which a species or group was sighted; 
Treq. (%)" is the relative frequency of sighting; the rest of the table reports statistics of non-zero data only 
(relative abundance when the species was present). 

E. Predictor summary table. This table summarizes the variables included in each of the seasonal trend 
GLM models for Occurrence (Stage I) and Abundance (Stage II). P-values for Bonferroni-corrected post-hoc 
significance tests of each effect (minimum p-value considering main effect and any interactions in which the 
variable occurs) are indicated by color shading (Box Table A). 



Variables that were not included are 
indicated by white squares. Seasons 
that were not modeled are indicated 
by an "X" through the corresponding 
columns of the table. 



Box 6. 1 -.Table A. Color shading for P-values. 



Color 


P-value 




Red 


p>0.1 




Orange 


p<0.1 




Yellow 


p<0.05 




Light green 


p<0.01 




Green 


p<0.001 





F. Diagnostic summary table. The 

first row of this table reports the 
percentage of the NY study area that 
falls into each certainty class ("Low", 

"Med", "High" columns), and the certainty class that would be assigned based on the average relative 
uncertainty value calculated over the whole NY study area ("ALL" column). The rest of the table reports 
cross-validation error statistics for each certainty class (based on the 50% of the data that were withheld 
from model fitting). Cells are color coded based on whether the results of each diagnostic can be considered 
"excellent" (green), "fair" to "good" (yellow), or "poor" (red). See Box 6.1:Table B for a list of the diagnostic 
statistics, notes on how they were calculated, and values used to define the "excellent", "fair to good", and 
"poor" categorization. Note that the color-coding of diagnostic statistics should be considered a general 
guideline to model performance; the usefulness of model predictions in any specific case often depends on 
the details of the application. One diagnostic statistic may be important for one application, but irrelevant for 
another. 



Box 6.1:Table B. Description of diagnostic statistics and color-coding of diagnostic tables. Cutoff values for "Poor", "Fair", "Good", 
and "Excellent" are subjective and qualitative categories and are intended as an interpretative aid, not a formal statistical test. 



Diagnostic 
statistic 


Description 


Calculated with 


Poor 


Fair to Good 


Excellent 


RankR 


Spearman rank correlation coefficient of 
observed vs. predicted 


Non-zero cross-validation data 


x<0.05 


0.05<x<0.3 


x>0.3 


%1SD 


Percent of observations within +/- 1 standard 
deviation (or standard error) confidence 
intervals of predicted value; theoretical ex- 
pectation is 68% 


Non-zero cross-validation data 


x<20% 


20%<x<50% 


x>50% 


AUC 


Area Under Curve statistic; area under the 
receiver operating characteric (ROC) curve 


All binary cross-validation data 
(presence/absence); maximum 
predicted probability in bin used 
as ROC classifier 


x<0.55 


0.55<x<0.75 


x>0.75 


p(AUC) 


p-Value for significance test of the AUC 
statistic 


Non-zero cross-validation data 




0.20>x>0.01 


x<0.01 


MAPE 


Mean Absolute Percentage Error = 
mean(|obs-pred|/obs)*100% 


Non-zero cross-validation data 


x>150% 


150%>x>50% 


x<50% 


Rel.MAE 


Relative Mean Absolute Error (expressed as 
a % of the 90th percentile - 10th percentile 
range of the data) 


All cross-validation data 


x>100% 


100%>x>25% 


x<25% 


Rel.RMSE 


Relative Root Mean Square Error (ex- 
pressed as a % of the 90th percentile - 10th 
percentile range of the data) 


All cross-validation data 


x>100% 


100%>x>25% 


x<25% 


Rel.Bias 


Relative Absolute Bias (expressed as a % of 
the 90th percentile - 10th percentile range of 
the data) 


All cross-validation data 


x>100% 


100%>x>25% 


x<25% 


Bias Dir. 


Sign (+ or -) of the Bias statistic. + indicates 
predicted value tends to be greater than 
observed value. 


All cross-validation data 




n/a 


n/a 



The maps shown in the main section of this report (Figures 6.8 to 6.37) are annual climatological maps. Full model 
profiles including seasonal maps for each species and group (and the "no birds sighted" category) are given in 
Appendix 6.C. These profiles include overlays of original data points on model predictions, the seasonal predictions 
that went into each annual map, more detailed quantitative maps of relative uncertainty, details on the structure of 
each model, histograms showing the temporal occurrence of each species/group, and results of the cross-validation 
accuracy assessment for each model. Similar seasonal profiles for hotspot analyses are given in Appendix 6.D. 
Readers interested in application of the models presented in this report are urged to consult Appendices 6.C and 
6.D to evaluate seasonal variation in model predictions, and the performance of each model in independent cross- 
validation. 

Below, very brief notes are given on predictive model results for each species and group, in relation to species 
life history and occurrence in the study area. These notes are intended only as an initial introduction and 
qualitative description of model results; the maps and diagnostics presented in figures, tables, appendices, 
and online supplements convey more detailed results for each species. 

6.9.1. Species notes 

Black-legged Kittiwake (Rissa tridactvla) 

The Black-legged Kittiwake (Figure 6.8, Tables 6.9-11) is one of the most abundant and frequently sighted 
species in the study area. The majority of sightings were made between late fall and early spring. The species 
breeds along coasts in the Arctic and sub-Arctic in summer, and winters out at sea along the Pacific and 
Atlantic coasts, including the study area. Predicted abundance was highest in the north and central parts of 
the study area, especially south of the eastern tip of Long Island, in the Hudson Shelf Valley vicinity, and south 
of Nantucket Shoals. This pattern was fairly consistent across seasons. Model predictive ability was marginal 
with a high proportion of white noise (unpredictable random variation) and poor ROC performance. However, 
some of the other diagnostic statistics indicate acceptable performance, especially in the "medium" certainty 
class. 

Common Loon (Gavia immer) 

The Common Loon (Figure 6.9, Tables 6.12-14) is a coastal species during the non-breeding season (i.e., 
outside of summer), and this is reflected in model predictions. Hotspots of predicted abundance occur all along 
the shore, especially in the vicinity of New York Bay, the NJ shore, Long Island, Block Island, and Martha's 
Vineyard. Loons are fairly commonly sighted nearshore in spring, fall, and winter, and range furthest offshore 
in spring; otherwise patterns were consistent across seasons. Model predictive ability was generally good, 
with low white noise, excellent overall ROC performance, and excellent performance in the 'high' certainty 
class that dominated the study area. The small white (blank) spot southeast of Block Island is a place where 
predictions were masked out because they were both extremely high (beyond the range of the data) and 
extremely uncertain (beyond the threshold over which we considered predictions too uncertain to map, and 
unsupported by any nearby data points). The high values around the edges of this blank spot are also fairly 
uncertain. Expert judgement is that this apparent hotspot is not likely to be real (D. Veit and P. Paton, pers. 
comms.). Also, because this is a highly coastal species, and the coverage of the Manomet dataset is poor 
nearshore, the abundance estimates for this species may be biased low. For example, the Rl SAMP study 
(Paton et al., 2010) found that Common Loons were relatively common in Rl waters in the winter. 

Common Tern (Sterna hirundo) 

Although the Common Tern (Figure 6.10, Tables 6.15-1 7) is the most widespread tern species in North America 
it is relatively infrequent in the study area dataset (<3% prevalence). This is likely partially due to the poor 
coverage of the MBO CSAP data in very nearshore areas: the species prefers coastal habitats along the 
ocean, rivers and lakes, which abut, but are not part of the study area. The largest Common Tern colony in 
the region (15,000+ individuals) is on Great Gull Island at the east end of Long Island Sound. There is an 
influx of Common Terns into more offshore waters in April and May, with peak abundance in the summer 
breeding months and trailing off into December. The model predictions reflect this seasonal pattern and the 
expected nearshore distribution. The species is listed as Threatened by New York. Predicted abundance is 
highest nearshore, especially south of Long Island. The area of high predicted abundance near Jamaica Bay 
is well-supported by data offshore, but not by any nearshore data points, and so should be interpreted with 



caution in light of the data distribution shown in Appendix 6.C. The spatial pattern is consistent across seasons. 
Model predictive ability was good, with low white noise, excellent overall ROC performance, and excellent 
performance in the 'high' certainty class that dominated the study area. Model predictions were more uncertain 
and variable in fall, when the species was less abundant. 

Cory's Shearwater (Calonectris diomedea) 

Cory's Shearwater (Figure 6.11, Tables 6.18-20) is a large seabird found across the Northern Atlantic and 
seldom seen near land except during breeding. Since it breeds on islands in the eastern North Atlantic and 
Mediterranean, it is not regularly seen close to shore in North America. It is commonly found where water 
masses mix. Most sightings in the study area are made during the summer and fall. Abundance predictions 
peak offshore and the abundance trend follows a general southwest to northeast pattern along the shelf edge, 
with a more expansive distribution to the north and east. The apparent hotspots south of Block Island and near 
Nantucket Shoals are not well-supported by data (they are based primarily on extrapolation of environmental 
relationships) and therefore should be considered hypothetical until tested with additional survey data. Model 
predictive ability indicated by error statistics and ROC analysis is fair to good where certainty is high. However, 
the white noise component of the model is moderately high indicating a high degree of unpredictable random 
variability in abundance at any given location. This is consistent with findings of the more recent Rhode Island 
SAMP study, which found significant interannual variation in the abundance of this species in offshore areas 
(Paton et al., 2010). It should also be noted that this species is attracted to fishing vessels and may be 
influenced by fishing patterns. 

Dovekie (Alle alle) 

Dovekies (Figure 6.12, Tables 6.21-23) are almost strictly pelagic, coming ashore only to breed on cliffs in 
areas far north of the NY Bight. Sightings are uncommon in the study area except offshore in winter, but 
when Dovekies are seen, they exhibit clear spatial and temporal patterns. Dovekies are most common in the 
winter months and there is an obvious preference for warmer waters above the shelf slope and in the middle 
of the study area, northwest of Hudson Canyon. This is consistent with Dovekie's tendency to concentrate in 
this region near temperature fronts and aggregations of copepods and similarly sized zooplankton (D. Veit, 
pers. comm.). Model predictive ability in ROC analysis was fair to good; abundance when sighted was highly 
variable (high Stage II white noise). This species may recently have increased in abundance (e.g., Paton et al., 
2010) and the patterns depicted here should be compared to more recent data. 

Great Black-Backed Gull (Laws marinus) 

The Great Black-Backed Gull (Figure 6.13, Tables 6.24-26) is a coastal species found in the North Atlantic 
and Palearctic, breeding on coasts in North America and Europe. The study area is towards the southern 
limit of this species' distribution (breeds south to North Carolina; ranges to Florida in winter), which may 
explain why sightings are more common in the northern part of the study area. Most sightings are made 
between July and November during breeding months and there are confirmed observations of breeding on 
Long Island. Relatively certain predictions of high abundance were made throughout the northeast portion of 
the study domain, including the vicinity of Block Island southeast to the shelf edge, and Martha's Vineyard 
south to the shelf edge. High abundances were also predicted near the coast and along the shelf edge. The 
onshore-offshore distribution varied somewhat with season (Appendix 6.C). Though highly variable (white 
noise component was high), presence predictions were generally excellent and error statistics were acceptable 
even for the 'low' certainty class. 

Great Shearwater (Puffinus gravis) 

The Great Shearwater (Figure 6.14, Tables 6.27-29) is one of only a few species found in the NY Bight that 
migrate from breeding grounds in the Southern Hemisphere to wintering grounds in the Northern Hemisphere. 
Migration follows a quasi-circular route moving up the western edge of the Atlantic in spring-summer and 
returning along the eastern Atlantic. Sightings in the study area occur primarily in summer and fall. Low 
abundance was predicted over most of the study area, suggesting that the birds remain primarily offshore 
during migration and generally follow the shelf edge. A broad, moderate concentration of abundance occurs in 
the center of the NY planning area, just south of Long Island extending south and southeast to the shelf edge 
and east past Nantucket Shoals. This pattern of abundance is similar to that of other shearwaters (e.g., Cory's 



Shearwater). Presence predictions were fair to good; abundance when seen was highly variable (high white 
noise). Error statistics were fair to good in higher certainty classes. 

Herring Gull (Laws argentatus smithsonianus) 

The Herring Gull (Figure 6.15, Tables 6.30-32) is a very widespread and abundant seabird in the study area. 
It is a year-round resident and breeds along the coasts. Especially high abundance occurs at the mouth 
of the Hudson River, along the south shore of Long Island, south of Nantucket at the eastern edge of the 
domain, and scattered other locations throughout the domain. This species is attracted to ships, and the high 
abundances off New York Harbor and south of Martha's Vineyard may be due to birds aggregating at fishing 
trawlers (and thus not reliable long-term hotspots). A clear and dramatic change in spatial distribution occurs 
over the course of a year (see Appendix 6.C). Sightings are distributed across the shelf during winter, spring 
and fall, but are rare greater than 50 km from shore during the summer breeding months. Very high white noise 
components indicate a highly variable, transient pattern of distribution, and error statistics are correspondingly 
poor; however, the model predicted the range of variability well (%1SD statistics) and ROC statistics were 
excellent in the 'medium' certainty class. 

Laughing Gull (Leucophaeus atricilla) 

The Laughing Gull (Figure 6.16, Tables 6.33-35) breeds along the eastern Atlantic with the greatest abundances 
seen south of the study area, though it is fairly common in NY. The study area is relatively close to the northern 
limit of the Laughing Gull breeding distribution (which extends as far north as Nova Scotia and New Brunswick). 
Predictions of highest abundance and occurrence are made within 50 km of shore and near the Hudson River 
and New Jersey coasts, and are consistent across spring, summer, and fall seasons. Sightings in the winter 
and early-spring are more rare; this species winters in the mid-Atlantic and to the south. Model predictions of 
presence are excellent, though abundance when seen is fairly variable. Predictions in the 'high' certainty class 
are excellent; 'medium' certainty predictions are less good. 

Northern Fulmar (Fulmarus glacialis) 

The Northern Fulmar (Figure 6.17, Tables 6.36-38) is a gull-like relative of albatrosses and shearwaters. Most 
sightings are made between January and June (winter-spring). The majority of sightings are offshore, between 
75 km from shore and the shelf edge, although sightings occur somewhat closer to shore in the northeast 
of the study area, offshore of Martha's Vineyard and Nantucket. Patterns are consistent across seasons. 
Performance in ROC analysis was good, and diagnostics are excellent for the high certainty class which 
covers about half the study area. 

Northern Gannet (Mows bassanus) 

The Northern Gannet (Figure 6.18, Tables 6.39-41) is one of the most abundant and widespread species 
of seabird in the study area. It is frequently sighted between October and April and exhibits distinct spatial 
patterns among seasons, with more offshore sightings in spring. In the winter most individuals are seen in the 
mid-to-inner shelf towards the southern part of the domain, in the spring throughout the domain and especially 
along the shelf edge, and in the fall in the inner to mid-shelf with peaks near Nantucket Shoals and Long 
Island. This species is known to aggregate to areas of fishing activity, and so its spatial distribution may have 
changed since the 1980's with shifting patterns of fishing effort. In particular, large aggregations of Northern 
Gannets were observed in association with foreign factory trawler boats in the 1 980's, and similar aggregations 
have not been observed since those boats stopped frequenting the region (D. Veit, pers. comm.). Thus, the 
hotspots near the shelf break require confirmation from more recent data before being considered persistent 
aggregations. Both the two-pulse fall and spring migrant pattern and the tendency to aggregate to fishing boats 
have been confirmed by recent studies in the area (Paton et al., 2010). Model predictive performance is poor 
to fair, depending on certainty class; the Stage II white noise component (unpredictability in abundance when 
present) is very high. 

Pomarine Jaeger (Stercorarius pomarinus) 

The Pomarine Jaeger (Figure 6.19, Tables 6.42-44) is a skua occasionally seen offshore of New York in the 
fall (<4% frequency). Very few sightings are made in other seasons. Areas of highest abundance are predicted 
along the shelf edge, and south of Nantucket shoals. It is possible that these aggregations are due to the 
presence of fishing trawlers and are not reliable long-term hotspots (D. Veit, pers. comm.). This species is 



seldom if ever observed close to shore. Model predictive performance was excellent for the high certainty 
class, but model uncertainty tended to underestimate uncertainty seen in cross-validation, due to the highly 
skewed abundance distribution. 

Sooty Shearwater (Puffinus griseus) 

The Sooty Shearwater (Figure 6.20, Tables 6.45-47) breeds on islands off southern South America and New 
Zealand and spends summers in the North Pacific and Atlantic. Most sightings in the study area are between 
April and July (spring-summer). Predictions show a clear preference for the shelf edge during the spring, 
similar to other shearwaters in the study area. Predictions of high abundance are also made in discrete areas 
in the spring and summer south of central Long Island and south of Nantucket Island, respectively. Model 
performance was generally fair, but excellent in the -20% of the study area with 'high' certainty. 

Wilson's Storm-Petrel (Oceanites oceanicus) 

Wilson's Storm-Petrel (Figure 6.21 , Tables 6.48-50) is one of the most common species seen in summer in the 
study area, and is also present in spring and fall. It breeds in Antarctic and sub-Antarctic seas, but ranges to 
the Northern Pacific, Atlantic and Indian Oceans during summer months. The majority of sightings in the study 
area are between May and September (spring, summer, and fall). Areas of high abundance are predicted fairly 
uniformly over the shelf, increasing offshore. Abundances extend into the nearshore in the northern part of the 
study area in summer. Though white noise is high, indicating a high degree of unpredictability in abundance, 
ROC analysis showed good predictive ability. Model diagnostics in the 'high' certainty class indicated excellent 
performance. 

6.9.2. Group notes 

Less Common Alcids 

The Less Common Alcids group (Figure 6.22, Tables 6.51-53) includes the Atlantic Puffin, Common Murre, 
Razorbill, Thick-billed Murre, and unidentified species in the Family Alcidae. These species are generally rare 
in the study area, though frequency of sightings reaches 5% in winter. They occur in winter and spring, and 
very rarely in fall. Predictions are uncertain and cross-validation results are poor. Generally there appears to 
be an area of elevated abundance along the shelf especially in the northeast of the domain, south of Nantucket 
shoals. ROC analysis shows very little predictive success, though more certain predictions do have better 
cross-validation error statistics, indicating that when the group is present the abundance predictions are fairly 
good. The Rhode Island SAMP study found that the Razorbill and Common Murre have become much more 
common than earlier surveys in this region in recent years, so analysis of newer survey data will be important 
to an improved assessment of this group (Paton et al., 2010). 

Coastal Waterfowl 

Coastal Waterfowl (Figure 6.23, Tables 6.54-56), as defined here, are a diverse group including scoters, ducks, 
mergansers, eiders and other waterfowl in the family Anatidae, plus Red-throated Loons (family Gaviidae). 
We note that loons are not generally considered to be waterfowl (usually this term refers to species in the 
family Anatidae) but are included here because the Red-throated Loon sightings in the Manomet dataset were 
spatio-temporally similar to the true waterfowl sightings in the region, and Red-throated Loons were not seen 
enough to model separately (likely due in part to their low detectability; these birds tend to dive when moving 
ships are approaching [P. Paton, pers. comm.]). These species occur near coastlines and islands throughout 
the study area, both inside and outside of Long Island Sound, northeast around Block Island, Nantucket, and 
Martha's Vineyard, and southwest along the NJ shore. Highest abundances are seen in spring. Distributions 
are generally consistent across seasons. Statistically, model performance is generally excellent, especially 
with regard to occurrence. Error statistics are good in the high certainty class, except the model-predicted 
confidence intervals under-predict error (the %1SD statistic is well below its theoretical target of 68.3%). 
However, there are significant caveats. Stage II white noise is high, indicating that although occurrence is highly 
predictable, the observed abundance when seen exhibits a high degree of unpredictable random variability. 
Moreover, because the Manomet dataset had very few nearshore surveys in winter when seaducks are most 
abundant, and also because Red-throated Loons and seaducks tend to avoid ships, the winter estimate of 
abundance for this group is likely to be a severe underestimate. Finally, because this group encompasses a 
large number of species, species-specific inferences cannot be made. If particular species of waterfowl are 



of concern, additional data will be necessary to facilitate species-specific modeling. We note that much better 
data sources exist for wintering seaducks (e.g., Zipkin et al., 2010). 

Jaegers 

Jaegers (Figure 6.24, Tables 6.57-59) are rare in the study area, only exceeding a frequency of 0.5% in fall 
(when they reach a frequency of 2%). They are occasional in spring and summer and absent in winter. The 
only season for which sufficient data were available to model was fall. The model performs fairly well for high 
certainty areas, given limited data, but the high abundances predicted at the edges of the domain (along the 
shelf edge) have low certainty. ROC analysis indicates high sensitivity but with a high false positive rate. Other 
cross-validation statistics suggest caution is necessary in applying this model. 

Phalaropes 

Phalaropes (Figure 6.25, Tables 6.60-62) were only present at high enough frequency to model in spring. 

They are pelagic in distribution, concentrating at the shelf edge, especially in the central and eastern part of 

the domain. Model performance is good to excellent especially in the high certainty class that covers half the 

domain. Model-predicted uncertainty bounds tend to underestimate observed variability in abundance when 

seen. 

Less Common Shearwaters 

The Less Common Shearwaters (Figure 6.26, Tables 6.63-65) group includes the Manx Shearwater, Audubon's 
Shearwater and unidentified species in the Family Procellariidae. Most sightings occur in the spring, summer 
and fall, with a peak in summer. Highest abundance is predicted in summer along the eastern end of Long Island 
and Block Island and near Nantucket Shoals, though uncertainty is high for some of these areas. Otherwise, 
scattered sightings are made offshore out to the shelf edge throughout the study area in spring, summer, 
and fall. Audobon's Shearwaters are more common offshore, where they are mixed with Manx Shearwater 
sightings. Manx Shearwaters dominate nearshore sightings off the eastern tip of Long Island. Abundance and 
frequency are generally greater in the center and northeast than the southwest of the region. However, model 
performance in cross-validation is poor and results of this model should be used with caution. 

Less Common Small Gulls 

The Less Common Small Gulls (Figure 6.27, Tables 6.66-68) group includes the Ring-billed Gull and 
Bonaparte's Gull. The group is fairly rare in the study area (<4%); most prevalent in the fall and winter. Both 
of these species are migrants passing through the area in these seasons; summer breeding grounds are in 
boreal North America. High abundances are predicted along the Hudson shelf valley and to its south, and near 
the east end of Long Island. Insufficient sightings were made in very nearshore coastal areas to characterize 
distribution accurately, as indicated by the high uncertainty in these areas. Spatial distribution was consistent 
across seasons. Model performance in cross-validation was fair. 

Less Common Storm-Petrels 

The Less Common Storm-Petrels group (Figure 6.28, Tables 6.69-71 ) includes the Leach's Storm-Petrel, Band- 
rumped Storm-Petrel, White-faced Storm-Petrel (rarely), and unidentified species in the Family Hydrobatidae. 
Species in this group are very rare except in summer, when frequency of sighting approaches 5%. Areas of 
highest predicted abundance are scattered offshore along the shelf edge, slightly higher toward the south. 
Distribution is fairly consistent across seasons, with a subtle southward shift in summer. Occasional sightings 
occur at the east end of Long Island, primarily in summer, and in the Hudson Shelf Valley vicinity. Leach's 
Storm-Petrel was the most common species sighted and the only one positively identified in the nearshore. 
Overall, model performance is fair to poor for SPUE, but presence prediction diagnostics are good for the 
'high' certainty class. Avery high Stage II white noise component means abundance when seen is very hard 
to predict. 

Less Common Terns 

The Less Common Terns group (Figure 6.29, Tables 6.72-74) includes the Roseate, Least, Royal, Arctic, Sooty, 
Bridled, Caspian, and Forster's Terns and unidentified species in the Family Sternidae. The first two species 
are listed as endangered and threatened, respectively, by New York. The Roseate Tern is also federally listed 



as Endangered by the USFWS. Most sightings are in the summer breeding months and are within 50 km of 
shore. Cross-validation shows presence/absence predictions are acceptable, but abundance predictions are 
poor except for the most certain areas (generally places far offshore where the species' are virtually certain to 
be absent). Given the sample sizes used to fit these models and the marginal performance statistics, caution 
should be used in applying SPUE predictions; presence/absence predictions can be used, but should also be 
treated with some caution. Caution should also be exercised because this group lumps many species, some 
of which are known to have different spatial distributions, and because the group contains more "Unidentified 
Tern" sightings than positively identified species sightings. These model results should be used as a starting 
point for forming hypotheses about distribution patterns which should be tested with further sampling and 
combined with additional species-specific survey data and expert opinion before being used for decision- 
making purposes. We note that Caspian Terns are now commonly seen in Rhode Island waters (Paton et al., 
2010), but no Caspian Terns were positively identified in the Manomet dataset. 

Unidentified Gulls 

The Unidentified Gulls group (Figure 6.30, Tables 6.75-77) could consist of a variable set of species depending 
on the location within the study area and the time of year. This group had enough sightings to model in fall, 
winter, and spring, but not summer, suggesting that its members breed elsewhere and are passing through the 
area in spring/fall or overwintering. Another possibility is that many of the unidentified gulls could be juveniles; 
juvenile gulls can be very difficult to identify. The low numbers in summer are consistent with this hypothesis 
as young gulls would not yet be fledged at that point. The peak predicted abundance is in fall near Martha's 
Vineyard. Abundance is also predicted nearshore throughout the study area (with smaller peaks near the 
mouth of Long Island Sound, near Nantucket Shoals, and near New York Bay). This pattern is fairly consistent 
across seasons. Cross-validation performance is acceptable to good in the medium and high uncertainty 
classes, though there is high white noise in Stage I and the model-predicted confidence intervals are too 
small (low %1SD). Overall, model results for this group should be treated with caution as the identity of the 
component species is unknown, possibly variable over time and space, and possibly overlapping with some of 
the other, positively identified, gull species and groups that were modeled separately. 

6.9.3. Non-modeled species groups 

Cormorants 

Cormorants (Figure 6.31, Table 6.78) are infrequent in the Manomet database, due in part to poor sampling 
in very nearshore areas. Point sightings of cormorants are most common in winter and are clustered near 
the east end of Long Island, Block Island, Martha's Vineyard, New York Bay, and southwest of Great Peconic 
Bay. More data is necessary to assess the spatial distribution of cormorants in offshore waters of NY if they 
are of particular interest. In the winter, when most common, unidentified birds were probably mainly Great 
Cormorants (P. Paton, pers. comm.). 

Rare Visitors 

The Rare Visitors group (Figure 6.32, Table 6.79) consists of species that are non-breeding, transient, and 
rare in the study area. Sightings are very infrequent (<1% total) and scattered around the study area with little 
obvious spatial pattern. If any of these species (listed in Table 6.2) are of particular interest, detailed additional 
studies will have to be performed. 

Less Common Skuas 

The Less Common Skuas group (Figure 6.33, Table 6.80) includes the Great Skua and unidentified sightings 
of skuas (species in the family Stereo rariidae that are not Jaegers). The unidentified skuas are very likely 
to have been either poorly seen Great Skuas or South Polar Skuas, as these are the only two skua species 
recorded in the western North Atlantic. Sightings of this group are very infrequent in the study area (<1%). Most 
sightings are in the fall months, similar to the Pomarine Jaeger. Sightings generally occur offshore along the 
outer continental shelf, and are slightly more concentrated toward the south-central part of the domain near the 
shelf. If the Great Skua or other skuas thought to be represented by this group are of particular concern, more 
detailed studies or additional data collection and modeling should be conducted. 



6.9.4. 'No birds sighted' 

Figure 6.34A shows the annual predicted index of abundance of surveys that result in no sightings of seabirds 
(measured by multiplying the predicted probability of occurrence by the predicted transect area in which no 
birds were detected, in km 2 ). Figure 6.34B shows the probability of at least one survey in any season resulting 
in "no birds sighted" (i.e., the annual integrated presence probability, calculated as if the 'no birds sighted' 
category were a species). Tables 6.81, 6.82, and 6.83 summarize the input data, predictors, and diagnostic 
statistics, respectively, for the 'no birds sighted' model. 

The eastern end of Long Island and areas near Block Island and Martha's Vineyard have a lower probability 
than average of experiencing times without seabirds (i.e., most surveys in these areas see seabirds). The 
inner shelf in the Long Island Platform vicinity, about 10-30 km offshore, has an above average probability 
of experiencing times without seabirds. The patterns vary somewhat from season to season, and predictive 
performance of the model is fair to poor (high white noise, marginal error statistics). The reader should be 
particularly cautious of high predicted "no birds sighted" values within 10 km of shore as Manomet survey 
coverage drops off rapidly in the nearshore. Nonetheless, there are discernible spatial patterns that may be 
useful as an alternative to abundance hotspot maps to identify areas of potentially reduced conflict between 
ocean uses and seabirds. 

6.9.5. Hotspots 

Predicted hotspots of abundance for all modeled species combined (Figure 6.35) occur along the coast, 
especially along the Hudson Shelf Valley and in scattered areas throughout the shelf, particularly near 
Nantucket Shoals. The onshore-offshore gradient in abundance is consistent with many previous studies of 
seabirds in this region, and recent intensive survey work in New Jersey and Rhode Island. 

Species richness hotspots (Figure 6.36) are scattered throughout the center and northeast of the study area, 
extending south-southeast from Long Island to the shelf edge, and between Long Island and Nantucket Shoals. 
Low diversity predicted beyond the shelf edge is unreliable, as indicated by the uncertainty overlay. Very 
nearshore predictions are also unreliable throughout the domain. This is consistent with increased species 
richness along the Atlantic Flyway. The patchy, uneven pattern of the species richness predictions is a result of 
the discrete nature of this variable (there can only be whole numbers of species) combined with the necessity 
of choosing a somewhat arbitrary threshold to define a species as present or absent at a given location. 

The predicted Shannon diversity index (Figure 6.37) shows a smoother pattern that is distinct from the richness 
and abundance patterns. It reveals hotspots and coldspots of diversity scattered throughout the shelf, with 
highest diversity in the northeast of the study area near Nantucket and at the mouth of Long Island Sound. 

It is important to remember that concentrations of abundance and diversity can form and disperse rapidly, 
because seabirds are highly mobile, and interact with dynamic ecological resources and processes. Thus, 
the patterns displayed here reflect the considerable variability in concentrations of seabird abundance over 
the 9-year observation period (1980-1988). For example, hotspots may form in entirely different locations in 
different years resulting in multiple hotspots in the final map, not all of which form in any given year or season. 
It is also important to note that uncertainty accumulates when predictive statistical models are combined, a fact 
that is reflected in the uncertainty maps. Reliable hotspot predictions can only be produced in places were data 
is sufficiently dense for all species. 

6.9.6. Point Maps of Seabirds of Concern 

Figure 6.38 shows the locations of sightings for four species of particular concern: the Roseate Tern, Common 
Tern, Least Tern, and Common Loon. The Piping Plover, listed as endangered by New York, was not in the 
Manomet dataset and therefore was not mapped. Most species of concern are sighted within 50 km of shore. 
Noticeable concentrations of sightings occur south of Jamaica and Great South bays. It is important to note 
that Figure 6.38 presents point sightings, not results of a predictive model. No information can be assumed 
regarding the presence or absence of the species in between sample points. With the exception of the Common 
Loon, these species were too rare to produce individual species predictive models. The remaining species 
were included in the predictive model for "Less Common Terns." 



Black-legged Kittiwake (Rissa tridactyla) 






Table 6.9. Data table: Black-legged Kittiwake. 
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Table 6. 10. Predictor table: Black-legged Kittiwake. 





Occurr-cfKE 


Abundance 


PndktDr 


JU 5u Fri Wi 


Jy &u Fd Wi 


BA1H 












■ 




SLOPE 












i / 






IM-lT 


. 










H 






EMCT 












\/ 






ITT 












v 






ETUf 










I 






TUP 










A 






IH I 












A 






Z£W 












n 






\JPMP 


















P- M 









Table 6.11. Diagnostic table: Black-legged Kittiwake. 



Figure 6.8. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /1 5-min) for Black-legged Kittiwake, with certainty classes overlaid. 




Common Loon (Gavia immer) 



Table 6.12. Data table: Common Loon. 
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Table 6.13 


Predictor table 


; Common Loon. 
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Table 6.14. Diagnostic table: Common Loon. 
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Figure 6.9. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /1 5-min) for Common Loon, with certainty classes overlaid (see legend in Figure 6.8). 





Common Tern (Sterna hirundo) 



Table 6.15. Data table: Common Tern. 
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Table 6.16 


Predictor table 


; Common Tern. 
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7aib/e 6. 1 7. Diagnostic table: Common Tern. 



Figure 6. 10. Predicted annual average relative index of abundance (SPUE, # indivJ 
km 2 /15-min) for Common Tern, with certainty classes overlaid (see legend in Figure 6. 8). 
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Cory's Shearwater (Calonectris diomedea) 



Ia£>/e 6.78. Date table: Cory's Shearwater. 
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Figure 6. 11. Predicted annual average relative index of abundance (SPUE, #indiv/km 2 /15- 
min) for Cory's Shearwater, with certainty classes overlaid (see legend in Figure 6.8). 



Dovekie (Alle alle) 



Table 6.21. Data table: Dovekie. 
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F/g^yre 6.12. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /15-min) for Dovekie, with certainty classes overlaid (see legend in Figure 6.8). 
*Note: Spring and Fall observations combined with Winter for modeling. 
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Great Black-backed Gull (Larus marinus) 






Table 6.24. Data table: Great Black-backed Gull. 
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7aib/e 6.25. Predictor table. 


Great Black-backed Gull. 
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7aib/e 6.26. Diagnostic table: Great Black-backed Gull. 
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Figure 6. 13. Predicted annual average relative index of abundance (SPUE, # indiv/km 2 /1 5- 
min) for Great Black-backed Gull, with certainty classes overlaid (see legend in Figure 6. 8). 
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Great Shearwater (Puffin us gravis) 



Table 6.27. Data table: Great Shearwater. 
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la/j/e 6.28. Predictor table: Great Shearwater. 
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Ia£>/e 6.29. Diagnostic table: Great Shearwater. 



Figure 6. 14. Predicted annual average relative index of abundance (SPUE, # indiv/km 2 /1 5- 
min) for Great Shearwater, with certainty classes overlaid (see legend in Figure 6. 8). 
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Herring Gull (Larus argentatus smithsonianus) 



Table 6.30. Data table: Herring Gull. 
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7a-b/e 6.31 


Predictor table 


: Herring 


Gw//. 
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7aib/e 6.32. Diagnostic table: Herring Gull. 




Figure 6. 15. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /15-min) for Herring Gull, with certainty classes overlaid (see legend in Figure 6.8). 
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Laughing Gull (Leucophaeus atricilla) 



Table 6.33. Data table: Laughing Gull. 
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Ta6/e 6.34 


Predictor table: Laughing Gull. 
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Table 6.35. Diagnostic table: Laughing Gull. 



Figure 6.16. Predicted annual average relative index of abundance (SPUE, # indiv/ 
knf/15-min) for Laughing Gull, with certainty classes overlaid (see legend in Figure 6.8). 
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Northern Fulmar (Fulmarus glacialis) 



7a/)/e 6.36. Dafa ted/e; Northern Fulmar. 
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7a/)/e 6.37. Predictor table: Northern Fulmar. 
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7aib/e 6.38. Diagnostic table: Northern Fulmar. 



Figure 6.17. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /1 5-min) for Northern Fulmar, with certainty classes overlaid (see legend in Figure 6.8). 
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Northern Gannet (Morus bassanus) 



Table 6.39. Data table: Northern Gannet. 
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Table 6.40 


Predictor table 


!: Northern Gannet. 
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7a£>/e 6.41 Diagnostic table: Northern Gannet. 




Figure 6. 18. Predicted annual average relative index of abundance (SPUE, # indiv/knf/1 5- 
min) for Northern Gannet, with certainty classes overlaid (see legend in Figure 6.8). 
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Pomarine Jaeger (Stercorarius pomarinus) g^ 



7aib/e 6.42. Date fa£>/e: Pomarine Jaeger. 




bwtok 


V 


■31 


IA 


WI 


All 


■ ■b» 


i-i 


fa 


nm 


3. 


122 


fr*^ [St] 


□in 


H.2X 


JjRt 


a lie 


LJtt 


SMflT w^ojvuint i'Hd. mrfiv./ fan '/U mwij 


Man 


1JC 


Jjfrl 


fr.ZJ 


VTt 


\f jU 


irhh^iir 


If 3ft 


Q7T 


am 


an 


ftJI 


MpeImi 


D>3C 


UI 


3 J* 


ft 17 


: eo 


SlftM* 


l '^l 


L1JEJ 


H.SZ 


ULtV 


ij. ji 


y a 


4U1 


17UI 


■IS JJh 


on 


4=1 7fl 



7a-b/e 6.43. 


Predictor table: Pomarine Jaeger. 
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Table 6.44. Diagnostic table: Pomarine Jaeger. 



Figure 6. 19. Predicted annual average relative index of abundance (SPUE, # indiv/km 2 /1 5- 
min) for Pomarine Jaeger, with certainty classes overlaid (see legend in Figure 6.8). 
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Sooty Shearwater (Puffin us g rise us) 



Table 6.45. Data table: Sooty Shearwater. 
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7aib/e 6.46 


Predictor table: Sooty Shearwater. 
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Table 6.47. Diagnostic table: Sooty Shearwater. 



Figure 6.20. Predicted annual average relative index of abundance (SPUE, # indiv/knf/1 5- 
min) for Sooty Shearwater, with certainty classes overlaid (see legend in Figure 6. 8). 



Wilson's Storm-Petrel (Oceanites oceanicus) 




7aib/e 6.48. Data table: Wilson's Storm-Petrel. 
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7a/)/e 6.49. Predictor table: 


l/W/so/?'s Storm-Petrel. 
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7aib/e 6.50. Diagnostic table: Wilson's Storm-Petrel. 



Figure 6.21. Predicted annual average relative index of abundance (SPUE, # indiv/km 2 /1 5- 
min) for Wilson's Storm-Petrel, with certainty classes overlaid (see legend in Figure 6.8). 
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Alcids, less common (4 species) 




Table 6.51 


Date table: Alcids, less 
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Jaegers (2 species) 



Table 6.52. Predictor table: Alcids, less common. 
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Figure 6.22. Predicted annual average relative index of abundance (SPUE, #indiv/km 2 /15- 
min) for Less Common Alcids with certainty classes overlaid (see legend in Figure 6.8). 



Coastal Waterfowl (7 species) 



Table 6.53. Diagnostic table: Alcids, less common. 
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Table 6.54. Data table: Coastal Waterfowl. 
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7a£>/e 6.55. Predictor table: Coastal Waterfowl. 
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Ta£>/e 6.56. Diagnostic table: Coastal Waterfowl. 




Figure 6.23. 
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Predicted annual average relative index of abundance (SPUE, # indiv./ 
for Coastal Waterfowl, with certainty classes overlaid (see legend in Figure 
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7a£>/e 6.57. Date teb/e: Jaegers. 
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Ia/3/e 6.58. Predictor table: Jaegers. 
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laib/e 6.59. Diagnostic table: Jaegers. 



Figure 6.24. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /15-min) for Jaegers, with certainty classes overlaid (see legend in Figure 6.8). 
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Phalaropes (2 species) 



Table 6.60. Data table: Phalaropes. 
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Table 6.61 


Predictor table: Phalaropes. 
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Figure 6.25. Predicted annual average relative index of abundance (SPUE, # indiv/ 
km 2 /15-min) for Phalaropes, with certainty classes overlaid (see legend in Figure 6. 8). 
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Diagnostic t 


a/)/e: Phalaropes. 
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Shearwaters, less common (2 species) 



Table 6.63. Data table: Shearwaters, less common 
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7ajb/e 6.64. Predictor table: Shearwaters, less common 



Audubon's Shearwater 
hoto by: David Pereksta, BOEM 
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Table 6.65. Diagnostic table: Shearwaters, less common. 



Figure 6.26. Predicted annual average relative index of abundance (SPUE, # indJvAnf/15- 
min) for Less common Shearwaters with certainty classes overlaid (see legend in Figure 6.8). 
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Small Gulls, less common (2 species) 



Table 6.66. Data table: Small Gulls, less common. 
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7ab/e 6.67. Predictor table: Small Gulls, less common. 
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Ia£>/e 6.68. Diagnostic table: Small Gulls, less common. 



Figure 6.27. Predicted annual average relative index of abundance (SPUE, # indiv./km 2 /1 5- 
min) for Less Common Small gulls with certainty classes overlaid (see legend in Figure 6.8). 
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Storm-Petrels, less common (3 species) 



Table 6.69. Data table: Storm-Petrels, less common. 
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Table 6. 70. Predictor table: Storm-Petrels, less common. 
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Figure 6.28. Predicted annual average relative index of abundance (SPUE, # in- 
div/km 2 /1 5-min) for Less Common Storm-Petrels with certainty classes overlaid 
(see legend in Figure 6.8). 

Terns, less common (7 species) 





Table 6. 71. Diagnostic table: Storm-Petrels, less 
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7a/)/e 6. 72. Data table: Terns, less common. 
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Roseate Tern 

Photo by: David Pereksta, BOEM 



Table 6.73 


Predictor table: Terns, less common. 
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1 7a/3/e 6. 74. Diagnostic table: Terns, less common. 



Figure 6.29. Predicted annual average relative index of abundance (SPUE, # indiv/km 2 /1 5- 
min) for Less Common Terns with certainty classes overlaid (see legend in Figure 6.8). 
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Unidentified Gulls (0 species*) 




Table 6.75. Data table: Unidentified Gulls. 




U^Mjtfw- ?-* 


WJ 


FA i*i" All 


Hnfex. '-'. 


1JI 


H&J M JE1 


h^ (If VH 
Msn 2.13 


It n* 
13a 


11% 4H 


1 114 


d.77 1_33 LIB 


LIMA* U.Lb 


DU 


Q. It ttld 


U.11 


UnLn It 41 


ail 


□ 4JL Q4*i 


fl4Ft 


PJUWW ?0U 


jflfc 


1_H* Z_ll 


l.EU 


U-n 71 11 


]i m 


1 1 1 "? H: 11 in 


71 11 



Rare Visitors (70 species) 



Table 6.76 


Predictor table 


»; Unidentified Gulls. 
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Table 6. 77. Diagnostic table: Unidentified Gulls. 



Figure 6.30. Predicted annual average relative index of abundance (SPUE, # indiv/knf/1 5- 
min) for Unidentified Gulls, with certainty classes overlaid (see legend in Figure 6.8). 

*This group contains two categories of unidentified gulls in the family Laridae, but no positively identifiable species. 




Cormorants (2 species) 



Table 6. 78. Data table: Cormorants. 
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Figure 6.31. Point observations of the relative abundance index (SPUE, # indiv/ 
km 2 /1 5-min) for Cormorants. Dark blue shading indicates unsampled areas; no 
predictive modeling was done for this species group due to insufficient data. 
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Table 6.79 


Data table: Rare Visitors. 
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Figure 6.32. Point observations of the relative abundance index (SPUE, # indiv/ 
km 2 /1 5-min) for Rare Visitors. Dark blue shading indicates unsampled areas; 
no predictive modeling was done for this species group due to insufficient data. 

Skuas, less common (1 species) 




Table 6.80. Data table: Skuas, less common. 
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F/gi/re 6.33. Point observations of the relative abundance index (SPUE, # indiv/km 2 /15- 
min) for Less Common Skuas. Dark blue shading indicates unsampled areas; no predic- 
tive modeling was done for this species group due to insufficient data. 






'No birds sighted' 



Table 6.81. Data table: 'No birds sighted'. 
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Table 6.82. Predictor table: 'No birds sighted'. 
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Figure 6.34A. Predicted annual average relative index of survey effort resulting in 
'No birds sighted' observations, with error class overlay. Units are km 2 of transect 
area in which no birds were sighted in a 15-minute survey Legend as in Figure 6.8. 





Table 6.83. Diagnostic table: 'No birds sighted'. 
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Figure 6.34B. Predicted annual integrated probability of a 'No birds sighted' 
observation. Units are probability of 'no birds sighted' in a 15-min. survey in at least 
one of the four seasons. Annual integrated presence probability was calculated 
as described in Appendix 6. A. For associated certainty map, see Appendix 6. C. 



Box 6.2. Notes on the 'No birds 
sighted' analyses 

In the course of the MBO CSAP 
standardized visual surveys, 
observers recorded instances 
when no seabirds of any kind 
were sighted in a standardized 
15-minute observation period. We 
analyzed these 'No birds sighted' 
observations using the same 
predictive statistical model as for 
other species and groups, with 
one exception: instead of SPUE, 
the 'relative abundance' of 'No 
birds sighted' was measured as 
the area (in km 2 ) of the transect in 
which no birds were sighted. 

The maps at left (Figure 6.34 A 
and B) may be useful as an alter- 
native or supplement to hotspot 
maps (Figures 6.35-6.37) to iden- 
tify areas of potentially high or 
low conflict between seabird and 
human uses of ocean habitat. 



Box 6.3. Notes on Hotspot Analyses 

Abundance and diversity are two important metrics of ecosystem structure. Concentrations of abundance 
can indicate areas that are important to multiple species for feeding, reproduction, migration, refuge 
from unfavorable conditions (storms, predators), and other important aspects of seabird life cycles. 
These may be apparent in annual distributions, or occur only seasonally due to varying environmental 
conditions and timing of life cycles. 

Abundance hotspots alone do not give a complete picture of important areas for seabirds, because 
they can be driven by the presence of only one or a few very abundant species. Areas of high diversity, 
as measured by the Shannon diversity index (H'), represent places where diverse bird communities 
form aggregations of many species in which even rarer species are relatively well-represented. These 
may represent convergences of environmental conditions suitable for many species, or may arise from 
species interactions. Regardless of the reason for high diversity, diversity hotspots are often considered 
of high conservation value because of the relatively high number of species than can be protected in a 
relatively small area. 

Beyond their ecological importance, hotspots of abundance and diversity are important for marine spatial 
planning because they represent areas where large numbers of individual birds and/or large numbers 
of bird species may be affected by human activities. Diversity hotspots may also be of value for non- 
consumptive human activities such as bird-watching. 

On this page and the pages that follow, annual predicted abundance hotspots (Figure 6.35), species 
richness hotspots (Figure 6.36), and Shannon diversity index hotspots (Figure 6.37) are shown. 



Abundance Hotspots 
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Figure 6.35. Predicted seabird abundance hotspot map. 




Richness Hotspots 
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F/g^yre 6.36. Predicted species richness hotspot map. 



Diversity Hotspots 
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F/gi/re 6.37. Predicted Shannon species diversity index hotspot map. 



Sightings Data for Species of Concern 
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Seabirds of Concern Distribution 
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Figure 6.38. Point map of raw sightings data for species of concern. This map shows the location of sightings in the Manomet dataset 
for four species of seabirds listed as endangered, threatened or of special concern by New York's Department of Environmental 
Conservation: the Roseate Tern (Sterna dougallii dougalliij, Common Tern (Sterna hirundoj, Least Tern (Sternula antillarumj, and 
Common Loon (Gavia immerj. The Piping Plover (Charadrius melodusj, listed as endangered by New York, was not in the Manomet 
dataset and therefore was not mapped. 




6.10. DISCUSSION 

The spatial models, data, and associated statistics presented in this chapter provide information on the long- 
term annual and seasonal spatial distribution of the marine avifauna offshore of NY. Results focus on identifying 
areas of likely and possible persistent aggregations of seabird species, species groups, and communities 
(multi-species hotspot/coldspot analyses) in the NY Bight. These maps are expected to be of high utility for 
marine spatial planning decisions involving seabirds in the NY Bight. 

This report provides information on the long-term annual and seasonal spatial distribution of seabirds in the 
NY region, which is one important consideration for design of spatial management strategies. Such long-term 
average maps, known as "spatial climatologies," are useful for characterizing long-term average patterns of 
abundance and occurrence of organisms in space (e.g., Santora and Reiss, 2011). However, it is important 
to emphasize that climatological mapping is only one step towards developing an effective and defensible 
management and conservation strategy for seabird populations. Consideration of species' differential sensitivity 
to different offshore activities, and differences in population-level vulnerability to mortality or other demographic 
changes will also be critical to a comprehensive evaluation of risks posed by offshore management actions 
(Drewitt and Langston, 2006; Hueppop et al., 2006; Hatch and Brault, 2007; Allison et al., 2008; Watts, 2010). 
The climatological seabird occurrence and abundance models presented here should be viewed as measures 
of the potential exposure of different species to the effects of spatial management actions, rather than a direct 
measure of risk (Crichton, 1999). Although beyond the scope of this report, future work should address the 
incorporation of "exposure models" using the information generated here into comprehensive risk analyses for 
specific types of proposed management and/or development activities. 

We wish to urge several cautions regarding inference of hotspots and coldspots of abundance from the maps 
in this report. First, the reader is urged to consult both the certainty class overlays presented on Figures 6.8- 
6.37 and the diagnostic statistics presented in associated tables. Second, if a particular hot or coldspot is of 
interest, the reader should consider consulting Appendix 6.C to determine whether that feature occurs in all 
seasons or only in particular seasons, whether it is directly supported by data points or is merely inferred from 
environmental conditions, and whether the model cross-validation plots indicate good performance outside the 
training data set. Third, if an important decision is to be made on the basis of a particular feature seen in model 
predictions, we would recommend several independent efforts to confirm model findings: 

• Examination of original Manomet data to determine whether the hot or coldspot persisted over many 
years consistently or was only sporadic 

• Collection of new survey data to confirm findings 

• Consultation with local experts 

In particular, hotspots or coldspots based only on extrapolation from environmental conditions, with no nearby 
data points, should be considered hypotheses to be tested rather than certainties. The data point overlay maps 
in Appendix 6.C are very useful for distinguishing such features from those that are directly supported by data 
points. 

Similar cautions apply about judging the persistence of a hotspot or coldspot seen in the model predictions 
presented here. In defining "persistence", the spatial climatological modeling approach we take does not 
account for how observations are spread over time. Several high observations in a small area can lead to a 
predicted hotspot, but they may all have occurred in the same year. One way to evaluate whether a hotspot is 
persistent over time within a year is to compare seasonal maps. However, to assess inter-annual persistence it 
is necessary to go back to the original data and examine the years in which observations associated with each 
potential hot or coldspot were made. 

Similar cautions should be observed with multi-species abundance, richness, and diversity hotspot maps. 
Appendix 6.D gives a detailed account of the performance of these models in a leave-50%-out cross-validation 
test, and should be used as a guide to interpreting multi-species hotspot model results. In particular, it is 
important to note that while the hotspot maps do a good job at predicting the relative values of abundance, 



richness, and diversity, the predicted values tend to be systematically higher than what is actually observed 
in any single 15-minute survey. This issue is discussed in Appendix 6.D and correction factors are presented, 
based on cross-validation, which could be applied if the absolute values of summed abundance, richness, or 
diversity are important. 

Consideration of temporal changes and population, community, and ecosystem dynamics will also be critical 
to any long-term management strategy. Many dynamic aspects of seabird ecology are not directly captured 
by the maps presented here. Seabirds interact with a highly dynamic environment where the location of food 
resources, competition and threats are constantly moving. Consequently, seabird habitats are variable in 
distribution, time, and extent with changing human activities and environmental biophysical parameters of the 
marine ecosystem. The age of the data used to develop the models presented here means that the patterns 
can only strictly be viewed as representative of the 1980's. Generalization to the present day requires an 
assumption that the human and natural environment has not changed. This assumption is likely to be more valid 
for some species than others, and in some areas than others (see Section 6.9.1., Species Notes). Similarly, 
our results assume that the climatological patterns of ocean conditions (including sea surface temperature, 
stratification, chlorophyll, turbidity, and zooplankton) have not undergone substantial shifts over the period from 
1980-present. In reality, ocean climate is known to vary in ways that influence seabird distribution (e.g., Zipkin 
et al., 2010), further underscoring the need for collection and analysis of more up-to-date comprehensive 
regional seabird surveys. 

The importance of different environmental predictors varied from species to species and often among seasons 
for a given species (Figure 6.5). Different predictors were sometimes important for Stage I (presence/absence) 
models when compared to Stage II (abundance-when-present) models. Overall, dynamic ocean climate 
variables like surface chlorophyll, surface turbidity, zooplankton, sea surface temperature, and stratification 
were often important, but for certain species static predictors like distance from shore and bathymetry were of 
equal or greater importance. Species that are more strongly correlated with dynamic oceanographic variables 
are more likely to be affected by long-term trends in ocean climate; thus, Figure 6.5 could be used to identify 
species more likely to be affected by ocean climate shifts. 

Ideally, the information in this report would serve as an initial guide to be followed up by more current, detailed 
studies or syntheses of more recent data for selected areas of interest. Information presented here could be 
supplemented with a meta-analysis including other seabird survey datasets, non-quantitative observations, 
unpublished reports, and reports from the primary scientific literature. Section 6.7. identifies several existing 
datasets that would be useful in a meta-analysis. Additional data as well as additional modeling could supplement 
and improve our understanding of any particular species and area. New modeling technologies are constantly 
emerging; machine learning techniques such as TreeNet, RandomForest and others are likely to outperform 
GLM-based approaches for modeling trend surfaces and should be explored in the future (Oppel et al., 2012). 

Several additional caveats should be considered when exploring the results presented in this report. The 
following is a non-exhaustive list (other, more technical caveats are discussed in Section 6.A.14.): 

• Data used in seabird assessments are biased towards species which are larger, commonly seen, 
conspicuous, abundant and/or attracted to ships. Birds that are small, rare, nocturnal or avoid ships were 
likely undercounted (see Tasker, 1984 for species/group detection information). 

• Predictions quantify the expected long-term average relative abundance of a species or species group, 
when averaged over time. A prediction of high abundance does not mean many individuals will be sighted 
there all the time; the hotspot may only occur periodically, and many individual observations may still be 
zero even if the long-term average abundance is high. 

• Predictions made in Long Island Sound, nearshore (<5 km from land) and off the shelf slope (waters 
deeper than 2,000 m) can be highly uncertain, because little survey effort occurred in these areas. 

• Sightings per unit effort are a relative index of abundance and should not be confused with an absolute 
population estimate. 



Implications of the first point about detectability and attraction are particularly important to consider. Estimates 
of detectability require frequent, replicated repeat surveys over the same area, and therefore are not possible 
with the Manomet dataset. However, future targeted survey efforts may be able to estimate differences in 
detectability for different species with methods similar to the Manomet methods. In that case, the model 
estimates presented here could be updated to reflect differences in detectability. Biases due to attraction are 
more difficult to correct. Certain species are more likely to be affected by detectability and attraction issues than 
others. For example, as noted in Section 6.9.1., Northern Gannets are attracted to fishing trawlers, and the 
presence of large factory trawlers near the shelf break in the 1 980's may have influenced some of the apparent 
hotspots for this species. Conversely, most terns are smaller and harder to detect or identify to species-level, 
and their abundance is probably underestimated. Diving birds like cormorants and dovekies also have lower 
detectability because they are sometimes underwater. These issues underscore the importance of treating the 
measures of relative abundance presented here as proxies for the real underlying patterns, and considering 
them in the light of other datasets, the life history of each species, and independent local expert opinion. 

Notwithstanding these caveats and the inevitable uncertainties associated with modeling a complex, dynamic 
community from scattered data, the maps presented here represent the first high-resolution depiction of spatial 
patterns in the marine avifauna of New York, and as such, represent an important step towards design of 
effective spatial conservation, management, and sustainable use strategies for New York's offshore waters. 
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Chapter 6: Online Supplements 

Online Supplements can be found by going to http://oceanservice.noaa.gov/programs/nccos/welcome.html 

and searching on the keywords: New York Spatial Plan 

Online Supplement 6.1: Predictor variable transformation details 

This document provides additional detail on the statistical preparation of potential predictor variables 
for analysis. 

Online Supplement 6.2: Species and group seasonal models, full diagnostic reports 

This Online Supplement is presented in the form of an HTML document providing links to full diagnostic 
reports from each of the seasonal species/group predictive models. 
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Appendix 6.A. Statistical Methods 



6.A.1. Model overview 

We adopt a two-stage approach that separates a model of the presence probability of a species from a model 
of its relative abundance when it is present. This approach has been successfully used to model highly zero- 
inflated marine distribution data (e.g., Stefansson, 1996; Ver Hoef and Jansen, 2007; Winter etal., 2011). This 
technique is also referred to in the statistical literature as a hurdle model (Cragg, 1971; Potts and Elith, 2006; 
Ver Hoef and Jansen 2007). In our case we refer to the two parts of the model as Stage I and Stage II. Stage I 
models the probability, p } (x,y), that species or group / is observed in a survey at location (x,y) in a given season 
(models were repeated for each season, but seasonal subscripts are omitted for simplicity): 

p t (x,y) = Prob(/ observed at <x,y> in a single 15-minute survey) Eq. 1 (Stage I) 

Here, p[x,y) is treated as a spatial random variable whose value is a probability; the details of how it is modeled 
are discussed below and in Sections 6.A.4., 6.A.5., and 6.A.6. We do not distinguish between observation 
and presence; the probability p[x,y) is assumed to be equal to the probability that the species was actually 
present during a single 15-minute survey conducted over the 9-year study period. In other words, probability 
of detection when the species is present is assumed to be 1 ; consequences of this assumption are discussed 
in Section 6.A.14. 

Stage II models E{Z.(x,y) | P/x,yJ=1}, the long-term mean of the observed relative abundance (SPUE), Z.(x,y), 
of species or group / at location (x,y) when the species or group is present: 

E{ Zficy) | Pfx,y)=l] Eq. 2 (Stage II) 

Here Z.(x,y) is a continuous random variable representing relative abundance (number of individuals sighted 
per 15-minute survey per km 2 of survey area), and P[x,y) is a Bernoulli random variable whose probability of 
success in a single trial is given by p[x,y). Note that E{A|B} represents the conditional expectation operator, 
which returns the expected value (arithmetic average over many trials) of the random variable A, given the 
value of the random variable B. This expected value can be thought of as the average SPUE that would have 
been recorded if the same location had been visited many times, instead of only once, during the 9-year survey 
period, and only non-zero values were included in the average. In this model, the observed value of SPUE at 
each location is our single observation of the random variable Z.(x,y), conditional on the outcome of P[x,y) at 
that location (0 if species / is absent, 1 if present). Over a 9-year period, assuming 6 hours of potential survey 
per day, approximately 20,000 temporally non-overlapping surveys could have been conducted at each location 
in each season. If hypothetical repeat surveys were conducted and averaged (excluding zero observations), 
then the value of that average would approach that of equation 2 as the number of repeat surveys increased, 
if the relevant assumptions outlined in Section 6.A.14. are also met. 

The MBO seabird data, processed as described in Section 6.8.3., are conceptually modeled as a set 
of outcomes of the purely spatial (non-temporal) random variables P[x,y) (Stage I) and Z.(x,y) conditional 
on P/x,y)=1 (Stage II). This relies on the basic assumption that the parameters that define these random 
variables (described in more detail below) do not vary over time within a season or among survey years. 
Implications of this assumption are discussed in Section 6.A.14. The use of spatial random variables without 
an explicit temporal component is termed a spatial climatological approach and has been used elsewhere to 
map "hotspots" and "coldspots" in long-term average patterns of species distribution (e.g., Santora and Reiss, 
2011). The word climatology in this context means long-term average. 

Both Stage I and Stage II of the model are themselves comprised of two sub-models: a trend model and a 
residual model, described in more detail below. The trend models are implemented as generalized linear 
models (GLMs), and predict large-scale variation in a species' distribution from environmental variables. The 
residual models are implemented as geostatistical models (kriging) to account for spatial autocorrelation in the 
residuals from the trend (Cressie, 1993; Pebesma, 1998). 



The GLM trend component was necessary because exploratory data analysis showed that both probability 
of presence (Stage I) and abundance when a species is present (Stage II) showed large-scale trends that 
were related to environmental variables. Notably, presence/absence often showed different large-scale spatial 
patterns than abundance when the species was present, motivating the two-stage approach. Other types of 
trend models are possible, and could be explored in future work (e.g., generalized additive models, classification 
and regression trees). 

The geostatistical component was necessary because the data are clustered and unevenly distributed in 
space, and preliminary analysis after removal of large-scale trends with GLM revealed autocorrelation in the 
spatial pattern of residuals. When this is the case, spatial dependence must be explicitly modeled to obtain 
unbiased estimates of GLM coefficients, as well as to properly model uncertainty at unsampled locations 
(Cressie, 1993; Chiles and Delfiner, 1999). A major advantage of the hybrid GLM-geostatistical approach is 
that predictions are accompanied by spatially explicit estimates of uncertainty, because spatial dependence in 
error fields is explicitly modeled (Pebesma, 1998). 

The final seasonal model prediction of SPUE is the product of Stage I and Stage II maps, which gives the 
unconditional expected value of Z.(x,y)\ 



E{ Z.(x,y) } = Pi (x,y) * E{Z/x,y) | P/x,y)=l} 



Eq. 3 (Stage I x II) 



This result follows directly from application of basic laws of probability and conditional expectation for random 
variables (Cragg, 1971; Ross, 2007). The final predicted value represents the average number of birds that 
would be seen if a site was surveyed repeatedly (using the same standardized 15-minute surveys), including 
times when the species was not seen as values of 0. 

The seasonal modeling process can be summarized as follows. For each species and group, for each season 
that can be modeled, the following steps are performed: 

1 . Transform potential predictor variables for linearity. See Section 6.A.2. below. 

2. Divide data into training and validation ("holdout") subsets for cross-validation purposes. See Section 
6.A.3. below. 

3. Stage I trend model: Use a GLM (binomial distribution, logit link) to generate a predictive map of the mean 
probability of species occurrence. See Section 6.A.4. below. 

4. Stage I residual model: Use ordinary indicator kriging (OIK) to predict the "residual" probability map, 
where "residual" is defined as the probability that the regression model leads to an incorrect classification 
of the presence state (P/x,yJ) of a given location. See Section 6.A.5. below. 

5. Final Stage I model: Adjust the trend-predicted probability map using the kriged residual probability map 
from step 4. The trend from step 3 and residual from step 4 are combined using probability laws. See 
Section 6.A.6. below. 

6. Stage II trend model: Use a GLM (normal distribution, identity link) to generate a predictive map of the 
mean abundance of a species when it is present. Data were transformed for normality for this part of the 
analysis using a Box-Cox type transformation (Box and Cox 1964), described further below, and back- 
transformed for final maps. See Section 6.A.7. below. 

7. Stage II residual model: Use Simple Kriging (SK) to predict residual map of the regression model of 
abundance. See Section 6.A.8. below. 

8. Final Stage II model: Add the trend map from step 6 and the residual map from step 7. See Section 6.A.9. 
below. 

9. Final Stage I x II model prediction: Multiply the predicted probability of occurrence at each location by the 
predicted abundance if present to produce the final prediction of the expected value (long-term average) 
of abundance at each location. See Section 6.A.10. below. 



10. Relative uncertainty calculation: scaled relative uncertainty values were calculated for the trend, residual, 
and final models for Stage I and Stage II, and for the final Stage Ixll prediction. See Section 6.8.11. below. 

11. Model evaluation, cross-validation, and relative uncertainty calibration. See Section 6.A.12. below. 

The sections below describe each of these steps in detail. 

6.A.2. Transformation of potential predictor variables for linearity 

Transforming independent variables in a multiple linear regression context for normality, centrality, and 
homogeneity of variance is often desirable for stabilizing estimates of regression parameters, and can also 
help to linearize relationships between predictors and response (Sokal and Rohlf, 1995). The family of power- 
law transformations studied by Box and Cox (1964) is particularly useful for improving both normality and 
linearity. A Box-Cox transformation is defined as follows, where X denotes the original variable and X* the 
transformed variable: 



lLn(Jr)JfJl = 



Eq.4 



and Cox, 1964; Dror, 2006) was used to estimate the Box-Cox 



Table 6.A.1. Predictor variable transformations. 



A maximum-likelihood procedure (Box 
transformation parameter A for each 
potential predictor variable, and guide the 
final choice of stabilizing transformation 
for each predictor. A priori knowledge 
about the types of transformations 
likely to be justified for different types of 
variables was also considered (Sokal and 
Rohlf, 1995). Predictor transformations 
expressions are shown in Table 6.A.1. 
Note that the transformation of some of 
these variables changes the sign of the 
linear relationship between variable and 
response; care must therefore be taken 
in interpreting the signs of regression 
coefficients for transformed predictors. 
Details of transformation choices 
and pre- and post-transformation 
distributions are given in Appendix 6.B. 
and Online Supplement 6.1 . 

Transformed predictor variables were centered and standardized prior to each GLM fit, using the set of 
values of each predictor variable at the data locations under consideration (centering and standardization was 
performed each time just prior to running the GLM, because different patterns of missing predictor data could 
cause different data points to be used, requiring re-centering and re-standardization). 

6.A.3. Selection of training and validation subsets for cross-validation 

50% of the observation locations were selected at random to be used in subsequent model-fitting (henceforth 
referred to as the training set), with the remaining 50% withheld for cross-validation (henceforth referred to 
as the validation or holdout set). All model selection and model fitting (Sections 6.A.4. to 6.A.10.) was carried 
out using only the training set. Cross-validation statistics were calculated by comparing model predictions at 
the holdout locations to the true data values at the holdout locations. Final predictive maps, however, used 
all available data by applying the models selected and fit based on training data to the entire original dataset. 
Cross-validation error estimates are thus conservative in the sense that they were derived from a model fit to 
a dataset one half the size of the final dataset. 



PREDICTOR 
VARIABLE 


TRANSFORMATION 
EXPRESSION 


NOTES 


BATH 


X*=(1-x)" 04 


ForallX<0 


SLOPE 


x *=x-04 




DIST 


x *=x°6 




SSDIST 


X*=X 


Not transformed 


SST 


X*=11605/(X+273.15) 


Arrhenius trans- 
form (Laidler, 1997) 


STRT 


X*=X 


Not transformed 


TUR 


X*=1/X 




CHL 


X*=1/(X+1) 




ZOO 


x*=x 


Not transformed 


SLPSLP 


X *=x-03 




PHIM 


X*=1/(X+3) 





6.A.4. Stage I trend model 

The trend component of the Stage I model, jjj fry), was estimated as follows. 



Table 6.A.2. Criteria for inclusion of a predictor variable in the set of potential 
predictors evaluated for a given seasonal Stage I or Stage II GLM model ("pre- 
screening criteria"). The set of points for which both data and predictor values were 
available had to meet all of these criteria for a predictor variable to be considered. 



CRITERION 


CONDITION 


Fraction of all data eliminated 


< 30% 


Fraction of presences eliminated 


< 20% 


Fraction of absences eliminated 


< 50% 


Number of presences remaining 


>15 



Observed data Z.[x,y) were first 
transformed to a binary indicator variable 
P/x,yJ, whose value was 1 if Z/x,yJ>0 
and otherwise. The initial set of 11 
potential predictor variables was then 
pre-screened to remove any predictors 
whose pattern of missing values would 
too greatly influence the data points 
that could be used to estimate the GLM. 
Pre-screening criteria are given in Table 
6.A.2. 

Predictor variables not excluded in the pre-screening process were centered, standardized, and the R package 
'glmulti' (Calcagno and Mazancourt, 2010; Calcagno, 2011) was used to search for the model with lowest 
AlCc from the set of possible generalized linear models, allowing two-way interaction effects to be included, 
but requiring that both corresponding main effects be in the model if an interaction term were to be included 
(marginality requirement). GLM model used a binomial distribution with a logit link function (Fox, 2008). 

The search method used depended on the size of the possible model space, which was restricted by the 
elimination of some potential predictors in the pre-screening stage (above) and by an upper bound on the 
number of terms determined by the number of observations. The number of terms in a model (not including 
the intercept) was restricted to be no greater than the number of observations divided by 10 (Sokal and Rohlf, 
1 995; Fox, 2008). If the number of predictors and/or maximum number of terms was sufficiently small, then the 
model space was searched exhaustively for the model with the lowest corrected Akaike's Information Criterion 
(AlCc; Sokal and Rohlf, 1995). If the number of predictors and/or maximum number of terms was intermediate, 
then a genetic algorithm with the default parameters and stopping criteria of deltaM=0.5, conseq=5 was used 
(Calcagno and Mazancourt, 2010; Calcagno, 2011). If the number of predictors and/or maximum number of 
terms was too large for the genetic algorithm to enumerate the model space, then an exhaustive search was 
performed of all possible models with 5 or fewer main effects (allowing for two-way interactions within each 
subset). 

The selected model structure was then fit to the data using Matlab Statistics Toolbox function 'glmfit', which 
implements standard Generalized Linear Model fitting by iteratively re-weighted least-squares (Bjorck, 1996; 
Fox, 2008). As before, a binomial distribution and logit link function were used. Use of binomial distributions 
and logit link functions involves assumptions that are discussed in Section 6.A.14. Parametric ± 1 standard 
error confidence bounds on GLM estimates were calculated using Matlab function 'glmval' (following equations 
in Fox, 2008). 

A standard array of GLM diagnostics was produced, including effect tests, deviance goodness-of-fit tests, 
several 'pseudo-R 2 ' measures designed for logistic regression, residual leverage and influence plots, and a 
variety of other diagnostic measures (for details see diagnostic tables in main text and Online Supplement 6.2). 
An ROC curve analysis was also performed to assess accuracy of the Stage I trend prediction (see Online 
Supplement 6.2). 

6.A.5. Stage I residual model 

The residual component of the Stage I model, e! (x,y), was estimated as follows. 

First, ROC curve analysis was used to determine the optimal cutoff value of the trend probability, /j! (x,y), 
to use for classifying the presence/absence data (Cardillo, 2008). ROC curve analysis identifies the cutoff 
probability for classification that optimizes the tradeoff between sensitivity and specificity, given a training 
dataset. This cutoff was then applied to transform the trend prediction map fj 1 (x,y) into a binary classification 



map (0=predicted absence, 1=predicted presence). Use of this ROC curve method to classify the trend can 
result in global bias of the classification toward the less-common class (usually presences), and the implications 
of this are discussed in Section 6.A.14. 

A binary indicator variable (the "misclassification indicator") was then created that took the value 1 if the binary 
classification map based on the trend was correct at a data location, and if not. Indicator variograms were 
estimated and modeled from this misclassification indicator, and Ordinary Indicator Kriging (OIK) was used 
to produce a map of predicted misclassification probabilities. Kriging predictions >1 or <0 were set to 1 or 0, 
respectively, to satisfy order relations for probabilities (Deutsch and Journel, 1998; Pebesma, 1998), and the 
resulting map was the residual component of Stage I, e! (x,y). Because misclassification of 0's as 1's and 1's 
as 0's were considered equivalent, the OIK geostatistical model makes the assumption that the spatial patterns 
of misclassification of 1's and 0's are equivalent (symmetry). Implications of this symmetry assumption are 
discussed in Section 6.A.14. 

Variogram models were fit automatically by a non-linear weighted least-squares minimization algorithm 
(Pebesma, 1998; Pardo-lguzquiza, 1999), using weights proportional to N/h 2 (the number of pairs of 
observations used to estimate each observation divided by the square of the lag distance), as described 
by Pebesma (1998). Following standard geostatistical practice, the functional form of the variogram and an 
initial-guess parameter set was specified prior to the least-squares minimization by inspection of the empirical 
variogram (Issaks and Srivistava, 1989; Cressie, 1993; Deutsch and Journel, 1998; Chiles and Delfiner, 1999). 

OIK produces parametric estimates of uncertainty (kriging standard error) for each location in the residual 
prediction map (Pebesma, 1998; Deutsch and Journel, 1998). An ROC curve analysis was also performed to 
assess accuracy of the Stage I residual prediction (see Online Supplement 6.2). 

6.A.6. Final Stage I model 

Because the trend and residual components of the Stage I model are probabilities, they can be combined 
using the laws of conditional probability to arrive at the full Stage I model as follows (Ross, 2007): 

p. (x,y) = Prob([trend model predicts i is present AND trend model is not wrong] OR Eq. 5 
[trend model predicts i is not present AND trend model is wrong]) 



which can be translated to, 

Pi (x,y) = jiffcy) • (1- ej(x,y)) + (1- fif fry)) ' effcy) 

which simplifies to the final Stage I model: 

Pi fry) = M?fry) + <?/ fry) - 2 • /if (x,y) • ef fr,y) 



Eq. 6 



Eq.7 



Parametric ± 1SE confidence intervals for the final Stage I model, p t (x,y), were derived by applying Equation 
7 to the parametric confidence intervals for jl// fry) and el (x,y) calculated using the GLM model and the 
geostatistical (OIK) model, respectively. 

6.A.7. Stage II trend model 

The trend component of the Stage II model, /j." (x,y), was estimated as follows. 

Data at non-zero locations were first transformed for normality using a Box-Cox power transform (see Section 
6.A.2.) whose parameter^ was chosen by a maximum likelihood procedure (Figure 6.A.1) (Box and Cox, 1964; 
Dror, 2006). Power-law family models have recently been found to outperform other often-used statistical 
models (e.g., Poisson) for describing distributions of seabird group sizes in our study region (Beauchamp, 
201 1 ), lending further motivation to the use of the Box-Cox family of transformations for this purpose. 

The initial set of 11 potential predictor variables was then pre-screened to remove any predictors whose 
pattern of missing values would too greatly influence the data points that could be used to estimate the GLM. 
Pre-screening criteria are given in Table 6.A.2. 



The predictor variables were 
centered, standardized, and the R 
package 'glmulti' (Calcagno and 
Mazancourt, 2010; Calcagno, 2011) 
was used to search for the model 
with lowest AlCc in the same way 
described for Stage I (Section 6.A.4.), 
except that in this case the GLM 
model used a normal distribution with 
an identity link function (Fox, 2008). 

The selected model structure was 
then fit to the data using Matlab 
Statistics Toolbox function 'glmfit', 
which implements standard 

Generalized Linear Model fitting 
by iteratively re-weighted least- 
squares (Bjorck, 1996; Fox, 2008). 
A normal distribution and identity 
link function were used. Use of the 
normal distribution here involves 
assumptions that are discussed 
in Section 6.A.14. Parametric ± 1 
standard error uncertainty bounds on 
GLM estimates were calculated using 
Matlab function 'glmval' (following 
equations in Fox, 2008). 

Because spatial autocorrelation 
biases the estimation of GLM 
parameters, we followed an iterative 
procedure to fit the final GLM in 
gstat (Pebesma, 1998; Chiles and 
Delfiner,1999). 

1. Calculate residuals and 
estimate residual variogram 
(see Section 6.8.). 

2. Re-calculate fit with gstat, 
using residual variogram 
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Figure 6.A.1 Box-Cox transformation of non-zero relative abundance (SPUE) data. 
Example for Dovekie in Winter, (a) selection by maximum likelihood procedure, (b) 
normal probability plots and histograms before and after transformation. 



3. Re-calculate residuals and 
repeat fitting with gstat (steps 2 
and 3) until residual variogram 
has converged (determined by 
inspection). 

A standard array of GLM diagnostics was produced, including effect tests, goodness-of-fit F tests, R 2 and 
several 'pseudo-R 2 ' measures to allow comparison with the Stage I logistic regression, residual leverage and 
influence plots, and a variety of other diagnostic measures (for details, see diagnostic tables in main text and 
Online Supplement 6.2). 



6.A.8. Stage II residual model 

The residual component of the Stage II model, e! 1 fay), was estimated as follows. 

First, residuals from the trend model fit were calculated by subtracting the observed values from predicted values. 
Residuals were calculated in Box-Cox transformed space to satisfy normality assumptions of geostatistical 
methods. Residual variograms were then estimated and modeled using gstat, and Simple Kriging (SK) was 
used to produce a map of predicted residuals. The resulting map was the residual component of Stage II, e" 
(x,y). 

Variogram models were fit automatically by a non-linear weighted least-squares minimization algorithm 
(Pebesma, 1998; Pardo-lguzquiza, 1999), using weights proportional to N/h 2 (the number of pairs of 
observations used to estimate each observation divided by the square of the lag distance), as described 
by Pebesma (1998). Following standard geostatistical practice, the functional form of the variogram and an 
initial-guess parameter set was specified prior to the least-squares minimization by inspection of the empirical 
variogram (Issaks and Srivistava, 1989; Cressie, 1993; Deutsch and Journel, 1998; Chiles and Delfiner, 1999). 

SK was also used to produce parametric estimates of uncertainty (kriging standard error) for each location in 
the residual prediction map (Pebesma, 1998; Deutsch and Journel, 1998). 

6.A.9. Final Stage II model 

In Box-Cox transformed space, the final Stage II model is simply the sum of trend and residual components: 

E { Z Transformed ^ | pfay)^ } = juf ( X ,y) + 8. 7/ fay) Eq. 8 

The result can be back-transformed to yield a prediction in the original units of SPUE: 



E{z ( (*>■)! /SfcjHI- 



kEfZ;™*™^') | Pjx t y)=m v: .if A * t> 



i; i ■/. *.■■■...." 



Eq. 9 



Back-transforms were constrained to lie between and 110% of the observed data maximum. 



Parametric ± 1SE confidence intervals for the final back-transformed Stage II model, E{ Zi(x,y) \ P j (x,y)=1}, 
were derived by applying Equations 8 and 9 to the parametric confidence intervals for |i 7/ (x,y) and s. 7/ (x,y) 
calculated using the GLM model and the geostatistical (SK) model, respectively. 



6.A. 10. Final Stage I x II model 

Stage I and Stage II models were combined as described in Section 6.A.1. (Equation 3) to produce each 
seasonal predictive map of the unconditional expected value of SPUE, which we will refer to as the "Stage I 
x M" prediction map or E{Z/x,yJ}. Specifically, E{Z.(x,y)} is equal to the product of Equation 9 (the final back- 
transformed Stage II prediction) and Equation 7 (the final Stage I model prediction). Note that the Stage I x II 
predictions are in back-transformed units (SPUE). 

Parametric uncertainty bounds (± 1SE) for the final Stage I x II maps were obtained by plugging the confidence 
intervals for |u. 7 fay), s/ fay), n/ 7 fay), and s. 7/ fay) described above into equations 7 and 9 and multiplying 
equation 7 by equation 9 for each set of uncertainty bounds. 

6.A.11. Relative uncertainty calculations 

In order to simplify comparison of uncertainties among different model components, uncertainties were 
converted to relative values that fall between and 1, with representing low uncertainty (high certainty) 
and 1 representing high uncertainty (low certainty). To further aid in interpretation, relative certainty classes 
were defined as follows: high certainty class (relative uncertainty ^0.5), medium certainty class (0.5<relative 
uncertainty<0.65), and low certainty class (relative uncertainty>0.65). The implications of a particular relative 



uncertainty value or certainty class for model performance can be determined by examining the diagnostic 
tables in the main text, which give cross-validation error statistics for each certainty class, and the cross- 
validation relative uncertainty calibration plots in Appendix 6.C. (described in Section 6.A.12. below). 

6.A.11.1.Stagel 

The relative uncertainty of Stage I model predictions is expressed as the scaled negative log (odds ratio), 
SNLOR. The negative log odds ratio, NLOR, is the negative natural logarithm of the ratio of the odds of 
correct binary classification (absence= 0, presence= 1) using the Stage I model to the odds of correct binary 
classification under a null model: 



NLOR 



M Odds m ^ L) 



Odds 



Eq. 10 



/•/;•/// 



To calculate the odds of correct classification under the Stage I model, Odds model , we first consider uncertainty 
of the Stage I model prediction relative to the cutoff probability c used for binary classification (in this case, 
the optimal cutoff probability determined by ROC curve analysis). The uncertainty around the Stage I model 
prediction p can be modeled by a normal curve on the logit scale, with mean equal to the Stage I prediction 
and standard deviation equal to the larger of the upper and lower 1SE confidence intervals: 



z ~iV(logit[p], max(logit[p +1SE ] - logit[p], logit[p] - logit[p lffi ])). 



Eq. 11 



Then the probability of the true predicted value lying above the cutoff probability c is given by 

p above = ?rob(z p > logit (c)), Eq. 12 

and the probability of the true predicted value falling below the cutoff probability is 



Prob(z < logit (c)). 



Eq. 13 



The classifier itself is subject to error, which we estimate by its performance in cross-validation: the true posi- 
tive (p TP ), true negative (p TN ), false positive (p FP ), and false negative (p FN ), rates of the classifier from the cross- 
validation confusion matrix at cutoff value c. The odds of correct classification using the Stage I model can 
then be calculated as: 
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Eq. 14 



To calculate the odds of correct classification under a null model, Odds .., we consider a null model in which the 



null' 



true and predicted presence/absence (1/0) states are given by Bernoulli random variables with probabilities p 1 
(equal to the global prevalence of the species) and c (equal to the optimal cutoff probability from ROC curve 
analysis), respectively. Then the null odds of correct classification are: 
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Eq. 15 



For a given set of cross-validation error rates ($ p , ft N , fy p , and ^ N ), the minimum and maximum possible values 
of the NLOR are: 
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Eq. 16 



The scaled NLOR, SNLOR, is calculated so that SNLOR=0 at the minimum possible value of the NLOR and 
SNLOR=^ at the maximum possible value of the NLOR: 
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Eq. 17 



Values of SNLOR closer to indicate model predictions that have relatively high odds of being correct com- 
pared to a null model (high certainty), whereas values closer to 1 indicate model predictions that have relatively 
low odds of being correct compared to a null model (low certainty). Relative uncertainties were calculated in 
this way for the Stage I trend, Stage I residual, and the final Stage I model, using the cross-validation ROC 
curve cutoff c and cross-validation error rates ($ p , p rN , £ p , and ^ N ) determined from the ROC analysis of trend, 
residual, and final Stage I predictions, respectively. Below, the final Stage I relative uncertainty is denoted a /re/ 
(x,y), and is equal to the value of SNLOR for the final Stage I model for species/group / at location (x,y). 

6.A.11.2. Stage II 

Relative uncertainty of Stage II trend, residual, and final model predictions were calculated as the ratio 
of prediction variances to the appropriate error variance (trend prediction variance: total sample variance 
minus residual variogram sill; residual variance: residual variogram sill; final prediction variance: total sample 
variance). Below, the final Stage II relative uncertainty is denoted o lljel (x,y). 

6.A.11.3.Stagelxll 

The relative uncertainty of final Stage Ixll model predictions was calculated by combining the relative 
uncertainties of final Stage I and Stage II models as follows: 
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Eq. 18 



The rationale behind equation 18 is that the Stage II relative uncertainty applies if the species is present (which 
is true with probability p t (x,y)), whereas the Stage I relative uncertainty applies if the species is absent (which 
is true with probability [1- p t (x,y)]). 

6.A.12. Model evaluation and uncertainty calibration 

In addition to the standard GLM effect tests and diagnostics, model predictive performance was evaluated 
in and out of the training set using a variety of error statistics, error plots and ROC curve analysis. As a final 
summary of model performance in cross-validation and aid to the reader in interpreting relative uncertainty 
values for the final Stage Ixll model, an uncertainty calibration plot was produced. For each location in the 
holdout set, the model developed from training data was used to predict the value at that location, and the 
magnitude of the difference between actual and predicted values (absolute error) was plotted versus the Stage 
I x II relative uncertainty value (Appendix 6.C.). Robust linear loess smoothing lines (Burkey, 2009) are plotted 
to show how actual out-of-set average prediction errors relate to parametric relative uncertainty estimates. 
Separate lines are plotted for overall error, and error when the species or group was present (since most 
species are relatively rare in any given survey, presences are harder to predict than absences). Similar relative 
uncertainty calibration plots are produced for Stage I predictions (presence/absence). 

Uncertainty calibration plots, ROC analyses, error statistics, and other model evaluation diagnostics are 
included in the diagnostic tables in the main report, in Appendix 6.C., and in Online Supplement 6.2. 

6. A. 13. Combination of seasonal climatological maps to produce annual climatological maps 

For each species and species group /', seasonal maps of climatological SPUE (Stage Ixll predictions) were 
combined to produce annual maps as follows: 
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Using the laws of probability and the expectation operator (Ross, 2007), this procedure can be shown to 
yield an unbiased estimate of the SPUE prediction for the entire year, given that (1) each seasonal model 
prediction is the unconditional expected value of SPUE, Z.(x,y), and, (2) the seasons are defined as non- 
overlapping and together cover the entire climatological year. These two conditions are true by definition. 

Annual integrated presence probability maps were produced by combining the seasonal climatological presence 
probability predictions (Stage I predictions), assuming statistical independence of the seasonal probabilities. 
Given 4 seasons, there are 1 5 possible ways in which a species or group can be present in at least one season. 
Represented as four digit binary codes, these are: 1000, 0100, 0010, 0001, 1100, 1010, 1001, 0110, 0011, 
0101, 1110, 1011, 1101, 0111, 1111. The probabilities of each of these outcomes was summed to produce the 
annual integrated presence probability, 

p/x,y) annual , which is equivalent to the annual climatological site occupancy probability for species/group /each 
location (x,y). 

To estimate the relative uncertainty associated with each annual map, the weighted average of the corresponding 
seasonal relative uncertainty maps was calculated, using the frequencies of occurrence of the species in each 
season as weights. For the annual SPUE map the relative uncertainty is given by: 
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Eq. 20 
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For the annual integrated presence probability map relative uncertainty, the relative uncertainty is given by: 
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Eq. 21 
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It can be shown that these relative uncertainties are monotonically related to the variance of the annual 
prediction error, but this relationship will not necessarily be linear for two reasons (Ross, 2007): 

1. The relative uncertainty of Stage I predictions is based on a log-odds ratio, and, 

2. Seasonal estimates of Z.(x,y) may not be uncorrelated, and therefore summation of variances, unlike 
summation of expected values, is not necessarily a linear operator. 

Thus we rely on uncertainty calibration plots (plots of cross-validation error vs. relative uncertainty, Section 
6.A.12.)to interpret the precise meaning of the relative uncertainty value for each species/group annual model. 

6.A.14. Summary and implications of model assumptions 

The seasonal predictive modeling approach described above makes a number of assumptions. To the extent 
these assumptions are violated, accuracy of predictions and uncertainty estimates may suffer. In this section 
we briefly review the major assumptions and their implications. The degree to which violations of model 
assumptions affect the performance of any given seasonal model can be assessed by considering the cross- 
validation performance statistics described in 6.A.12 and reported in the main text diagnostic tables, Appendix 
6.C, and Online Supplement 6.2. 

Important general assumptions: 

• Stationarity of pattern over time within seasons and among years 

Statistically, stationarity in this context means that the region-wide mean, variance, and spatial structure 
of abundance and occurrence patterns do not change over the time period we studied. Ecologically, 
stationarity implies that the ecosystem has not undergone any fundamental shifts in patterns and 
processes (e.g., climate trends, ocean climate regime shifts, introduced species, changes in patterns 



of human activities like fishing). If this assumption is violated, temporal variation will show up as non- 
spatially structured error ("white noise") in the model result. Model parameters and predictions may also 
be biased (cross-validation errors will not be centered at 0). The predicted spatial pattern may be an 
amalgam of different patterns that occurred at different time periods (e.g., "smearing" of hotspots that 
moved from year to year). If there are major changes in the underlying processes, the model will also be 
less generalizable to other time periods. 

• Stationarity of environmental predictor climatologies 

The use of long-term climatologies of time-varying environmental predictors (such as SST and 
stratification), assumes that the long-term seasonal mean spatial patterns of these variables have not 
changed over time. Major changes in the underlying environmental patterns and processes will make the 
model less generalizable to other time periods. 

• Unbiased year-to-year sampling (no temporal effect included) 

If the sampling pattern is non-random within seasons and/or across years, GLM parameter estimates 
and parametric uncertainties could be biased and inaccurate. This problem will be exacerbated if the 
assumptions of temporal stationarity of predictors and response are also violated. The Manomet survey 
was conducted on ships of opportunity, so samples were not random in space or time; therefore some 
biases due to unbalanced effort are expected. 

• Perfect detectability; freedom from other kinds of sample bias 

To the extent that a given species or species group is not perfectly detectable by the sampling protocol, 
relative occurrence and abundance indices will be biased compared to true abundance and occurrence 
values. Predictions from this model should be considered relative, rather than absolute, estimates of 
occurrence and abundance. In addition to detectability, similar biases can result from attraction of certain 
species to the survey platform (boats). Finally, systematic study biases may exist in the types of species 
that were recorded. For example, we found very few records of passerines in the Manomet dataset, even 
though there is evidence of offshore sightings of these species from other sources. These and other 
birds that are rare but not absent in the offshore may require other survey and modeling approaches if 
they are of conservation concern. 

• Constant relationship between sampling effort, relative indices of occurrence and abundance, and true 

values of occurrence and abundance 

Not only are species unlikely to be perfectly detectable, the relationship between our relative indices of 
occurrence and abundance and the true values of occurrence and abundance could vary in time and 
space, depending on differences in observers, weather conditions, animal behavior, etc. Such variation 
introduces an un-accounted for source of measurement error into data. 

Important Stage I assumptions 

• Binomial distribution and logit link function 

To the extent that these distributional assumptions are violated, trend predictions may be 
biased and parametric confidence intervals inaccurate. 

• Use of receiver operating characteristic (ROC) curve optimal cutoff analysis to classify residuals from the 

trend model 

Use of the ROC classifier may introduce bias into the final presence probability estimates at the expense 

of balancing overall sensitivity and specificity. 

• Symmetry assumption for misclassification probability field 

Misclassification of absences as presences may not show the same spatial pattern as misclassification 
of presences as absences; if that is the case, then model predictions may be biased and the model 
may perform better for one type of misclassification than for others, even though parametric uncertainty 
estimates are the same. 

Important Stage II assumptions 

• Normality and linearity of Box-Cox transformed predictors and responses in the Stage II trend model 
We assume that the Box-Cox transform in Stage II is sufficient to achieve normality of residual variances 



and linearity of underlying response-predictor relationships. Since the underlying seabird relative 
abundance data are based on counts (divided by transect area to create a quasi-continuous density 
estimate), this requires that we assume the continuous Box-Cox transformed Gaussian distribution 
used to represent non-zero relative abundance is an adequate approximation to the underlying discrete 
probability distribution. The appropriateness of these assumptions is difficult to test directly and the reader 
should rely on cross-validation performance statistics to judge the extent to which these assumptions 
were approximately correct. 

• Trans-Gaussian assumption in the Stage II residual (geostatistical) model 

Simple Kriging also assumes approximate normality; therefore the adequacy of the Box-Cox transformation 
to achieve normality of the residual distribution is also important to the accuracy of the kriging prediction 
(especially the validity of the kriging variance). 

• Back-transform issues (extrapolation of the CDF tail) 

When back-transforming Stage II predictions, we have arbitrarily cut off the upper end of the distribution 
at 110% of the data maximum, which may not always be appropriate. This is only expected to influence 
the highest predicted values. 

Important Stage Ixll assumptions 

• Separability of abundance and presence/absence patterns 

We have assumed that abundance is conditionally independent of presence/absence (that is, abundance 
can be modeled independently of presence probability). If this assumption is violated, then the Stage Ixll 
estimates will be biased. The direction of this bias will depend on the sign of the dependence, and on the 
Box-Cox transformation parameter. The degree of bias in predictions can be assessed (and corrected for) 
by examining cross-validation bias statistics in the diagnostic tables. 

Important assumptions of annual maps 

• Seasonal estimates of expected SPUE, Z.(x,y), are uncorrelated with each other. 

If seasonal estimates of SPUE are positively correlated with each other, then the summation of 
unconditional expected values will still be correct but the relationship between actual prediction error and 
the predicted relative uncertainty value will be affected. The cross-validation uncertainty calibration plots 
should be used as a guide to the true relationship between relative uncertainty and prediction error for 
each annual model. 

• Seasonal estimates of presence probability p t (x,y) are independent of each other. 

If presence probabilities are not independent from season to season, then the integrated annual presence 
probability maps will over or underestimate annual site occupancy probability, depending on the sign of 
the dependence. 




Appendix 6. B. Environmental Predictor Variables 
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6.B.1. Overview 

The set of 11 potential predictor variables considered in 
this study are listed in the body of this chapter (Section 
6.8.5.) and the statistical transformations applied prior to 
inclusion in regression models are discussed in Appendix 
6.A. (Section 6.A.2.). This appendix provides additional 
detail about the potential predictor variables, including 
maps of each predictor before and after the transformations 
discussed in Appendix 6.A were applied (Figures 6.B.1 
through 6.B.11). Additional detail about the choice of 
transformations is provided in Online Supplement 6.1. 

6.B.2 Bathymetry and coastline (BATH, SLOPE, 
SLPSLP, DIST, SSDIST) 

Depth data was extracted from the 3 arc-second 
(approximately 84 m in our study area) horizontal resolution 
NOAA U.S. Coastal Relief Model (CRM) (NOAA National 
Geophysical Data Center [NGDC] 2010), and merged with 
the 1 arc-minute (approximately 1.6 km in our study area) 
ETOP01 database (Amante and Eakins 2009) in offshore 
areas not covered by the CRM (the ETOP01 database 
was projected to match the CRM and bilinearly resampled 
to the 83.9 m CRM cell size prior to merging). See Chapter 
2 for more information about different bathymetry datasets 
in this region. Depths below the MLLW vertical datum were 
negative. Depths greater than +2 in the CRM dataset (i.e., 
elevations of 2 meters above MLLW or higher) were set to 
+2. The merged dataset was then reprojected to geographic 
coordinates (WGS84 datum) and smoothed (rectangular 
block-average using the centroids of the 83.9m grid cells) 
to the 30 arc-second model grid. Any cells whose block- 
averaged depth value at the final 30 arc-second resolution 
was >0 were set to 0, so that all depth values were <=0. 

To reduce the amplitude of potential artifacts from merging 
and resampling of bathymetry datasets, to reduce the 
influence of outlier points known to exist in both CRM 
(NOAA NGDC 2010) and ETOP01 (Amante and Eakins 
2009) datasets, and to allow for the likely imprecision of 
any possible influence of bottom topopgraphy on above- 
surface bird distribution, the 30 arc-second merged 
bathymetry grid was reprojected to the UTM 1 8N coordinate 
system, resampled bilinearly to the original 83.9m CRM 
grid resolution, filtered with a 21x21 cell (1.8 x 1.8 km) 
Gaussian blur kernel (1o = 220 m; filter calculated to 4o 
and weights re-normalized to sum to 1) (Gonzalez and 
Woods 1992) and then rectangular block-averaged back 
to the 30 arc-second grid, using centroids of the 83.9m 
projected grid to identify block members. The result was 
the BATH predictor grid. 

Slope was calculated on the filtered bathymetry while still 
at 83.9 m resolution, as the maximum percentage change 
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Figure 6.B.1. Predictor: BATH (Bathymetry). 
Original units: meters water depth relative to 
mean lower low water. 
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Figure 6.B.2. Predictor: SLOPE (Bathymetric 
slope). Original units: % 




in depth from each 30 arc-second grid cell to its 4 
neighbors. The calculated slope was then filtered again 
with a 21x21 cell Gaussian blur kernel and a 21x21 cell 
box kernel. The result was rectangular block-averaged 
back to 30 arc-second grid, resulting in the SLOPE 
predictor grid. 

Slope-of-slope was calculated as the slope of the 
doubly filtered slope grid while still at 83.9 m resolution. 
This calculation was subsequently filtered with a 21x21 
cell Gaussian blur kernel and a 21x21 cell box kernel, 
and block-averaged to the 30 arc-second grid, resulting 
in the SLPSLP predictor grid. 

As a result of the filtering and block-averaging 
operations, the bathymetry, slope, and slope-of- 
slope predictor values in each 30 arc-second grid 
cell represent a weighted average of "local" bottom 
topographic characteristics. In the case of BATH, 
information in each focal grid cell can come from up 
to to 1.6 km away from the grid centroid (using 3o as 
the effective cutoff distance of the Gaussian filter). In 
the case of SLOPE, due to additional filtering steps, 
information in each focal grid cell can come from up to 
3.1 km away from the centroid. In the case of SLPSLP, 
this distance is 4.7 km. Thus these three predictors, 
in addition to quantifying different features of benthic 
topography, also contain information deriving from 
several different topographic scales. 

Error statistics of bathymetry derived from the same 
underlying data as the CRM are described in Chapter 2 
of this report, and bathymetric uncertainty for this region 
has also been studied by Calder (2006). Slope and 
slope-of-slope have additional error because they are 
multiple-point statistics. Over most of the study area, 
relative errors will typically be <5% for depth, <10% for 
slope, and <20% for slope-of-slope. 

Distance from shore (DIST) was calculated by 
measuring the shortest straight-line distance (in km in 
UTM18N projected coordinates) from the centroid of 
each grid cell to the 1:250,000 World Vector Shoreline 
(Soluri and Woodson 1990), which was found to agree 
within ±0.5 km with the contour of the m isobath in 
the original CRM bathymetry dataset in our study 
region, and thus considered of sufficient accuracy 
given the chosen grid resolution. For grid cells whose 
centroids fell on land, distance to shore was set to 0. 
Distance from shelf edge (SSDIST) was calculated by 
measuring the shortest straight-line distance (in km in 
UTM18N projected coordinates) from the centroid of 
each grid cell to the shelf edge, defined as the 200 m 
isobath. Distances from grid cells offshore of the 200 m 
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Figure 6.B.3. Predictor: DIST (Distance from shore). Original 
units: meters. 
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Figure 6.B.4. Predictor: SSDIST (Signed distance from shelf- 
break). Original units: meters. 



Figure 6.B.5. Predictor: SST (Sea Surface Temperature). Original units: °C 




isobaths were assigned a negative sign, whereas 
distances from inshore cells were assigned positive 
signs. For purposes of this measurement a line 
feature representing the -200 m isobath contour 
was created by contouring the merged, smoothed 
bathymetry product using ESRI Spatial Analyst's 
contour tool. Although these distances were 
measured in the projected UTM18N system, their 
values were recorded on the same 30 arc-second 
model grid as other variables, and represent 
distances from the geographic centroids of those 
grid cells. Based on the scales and uncertainties of 
the source data, errors in these distances are of the 
order 0.5-2 km, or about 1 to 2%. 

Finally, a land mask was created by finding cells of 
the 30 arc-second model grid within which >=51 % of 
the contained CRM values (based on centroids) had 
positive depth values (i.e., land). Model predictions 
were not produced for these predominantly land- 
covered grid cells. 

6.B.3 Benthic surficial sediments (PHIM) 

The USGS usSEABED bottom sample database for 
the Atlantic coast of the US (Reid et al., 2005) was 
used as described in Chapter 3 of this report to create 
gridded maps of mean cp (where cp = Log 2 [surficial 
sediment grain size in mm]). Dr. John Goff kindly 
provided a quality-controlled version of the merged 
usSEABED parsed and extracted datasets, which 
included unpublished updates to the usSEABED 
database, selected only surficial sediment records, 
eliminated duplicates and spurious records, and 
applied the bias correction of Goff et al. (2008) to the 
parsed values. Goff's quality-controlled dataset of 
mean cp estimates was interpolated using ordinary 
kriging (with locally quadratic trend) to produce 
estimates of mean cp on the 30 arc-second model 
grid (PHIM). See Chapter 3 for characterization of 
uncertainty in of mean grain size predictions. 

6.B.4 Pelagic environmental variables (STRT, 
SST, TUR, CHL, ZOO) 

Seasonal climatologies of the following pelagic 
environmental variables were taken from Chapter 4 
of this report (Section 4.3). These variables included 
water column stratification (STRT) from optimally- 
interpolated vertical profiles of temperature and 
salinity, sea surface temperature (SST) from satellite 
data, surface chlorophyll-a concentration (CHL) and 
a turbidity proxy (TUR) from satellite ocean color 
data, and zooplankton biomass from near-surface 
plankton tows (ZOO). 
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Figure 6.B.6. Predictor: SLPSLP (Slope of the bathymetric slope). 
Original units: %of%. 
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Figure 6.B.8. Predictor: STRT (Water-column Stratification). Original units: long-term climatological average stratification, measured 
as surface seawater density (kg m 3 ) minus density at 50 m. More negative values indicate stronger stratification. 



Figure 6.B.7. Predictor: PHIM (Mean phi of surficial sediments). 
Original units: mean log 2 (grain size in mm) of surficial sediments. 
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Figure 6.B.9. Predictor: TUR (Turbidity proxy). Original units: Normalized water-leaving radiance at 670 nm. 
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Figure 6.B.10. Predictor: CHL (Surface chlorophyll-a concentration). Original units: mg m' c 
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Appendix 6.C. Species and Group Seasonal Profiles 

Black-legged Kittiwake 
Common Loon 
Common Tern 
Cory's Shearwater 
Dovekie 

Great Black-backed Gull 
Great Shearwater 
Herring Gull 
Laughing Gull 
Northern Fulmar 
Northern Gannet 
Pomarine Jaeger 
Sooty Shearwater 
Wilson's Storm-Petrel 
'No birds sighted' 
Alcids, less common 
Coastal Waterfowl 
Jaegers 
Phalaropes 

Shearwaters, less common 
Small Gulls, less common 
Storm-Petrels, less common 
Terns, less common 
Unidentified Gulls 
Non-modeled Groups 

Cormorants (2 species) 

Skuas, less common 

Rare Visitors (10 species) 



172 
174 
176 
178 
180 
182 
184 
186 
188 
190 
192 
194 
196 
198 
200 
202 
204 
206 
208 
210 
212 
214 
216 
218 
220 
220 
220 
220 
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Figure 6.B.11. Predictor: ZOO (Zooplankton biomass). Original units: mean displacement volume per volume of water strained (ml m 3 ). 




Appendix 6.C. Species and Group Seasonal Profiles 
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Annual Presence Relative Uncertainty 




Seasonal SPUE Relative Uncertainty 



Seasonal Presence Probability Maps 



Seasonal Presence Relative Uncertainty Maps 
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Annual Cross-Val Observed vs. Predicted 



Annual SPUE Uncertainty Calibration Plot 



Annual Cross-Validation ROC Plot 



Annual Presence Uncertainty Calibration Plot 
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Figure 6.C.1. Guide to the elements presented in each species/group seasonal model profile. A shrunk-down image of an example two- 
page model profile spread is shown, with capital letters marking the different graphic elements as they are described in Section 6.C. 1. 

6.C.1. Profile interpretation guide 

This appendix presents two-page profiles for each modeled species and group giving more detailed results of 
predictive models of abundance and presence/absence. In each profile, the annual and seasonal predictive 
maps are shown with original data points overlaid for comparison, annual and seasonal maps of relative 
model uncertainty are given, separate maps are shown for abundance (sightings per unit effort, SPUE) and for 
occurrence (probability of presence), and model fit and model uncertainty plots based on cross-validation are 
provided to help assess model accuracy. 

Rather than number and caption each figure and table in these profiles repetitively, we have developed this 
profile interpretation guide that explains each element of the 24 two-page profiles that follow. Figure 6.C.1 
(above) is a schematic illustration of the layout of each profile. The large white letters in transparent black 
boxes indicate each element of the layout. Each element, lettered A though O, is explained below: 

A. Annual climatological (long-term average) predictive map of relative abundance. SPUE is used as a proxy for 
abundance. Cool colors represent low and warm colors represent high SPUE. Land is represented by black, areas 
with insufficient data to make a prediction are white, and the offshore planning area identified by New York DOS is 
shown by a dotted magenta line. All non-zero SPUE data (training and validation datasets) are overlaid as colored 
dots on the same colorscale as the gridded SPUE predictions. 

B. Monthly pattern of occurrence - Temporal histogram showing frequency of occurrence by month. Data is presented 
from two complementary sources to more fully capture pelagic and nearshore species. The Manomet dataset (red 
bars) was collected offshore by ship-based surveys and the eBird dataset (blue bars) consists mostly of onshore 
and nearshore surveys. Differences between histograms from the two datasets may represent an onshore-offshore 
distributional gradient. 



C. Seasonal climatological (long-term average) predictive maps of relative abundance (SPUE, No. indiv. / km 2 / 1 5-min). 
All non-zero SPUE data (training and validation datasets) for each season are overlaid as colored dots on the same 
colorscale as the gridded SPUE predictions. 

D. Cross-validation Observed vs. Predicted plot (mean observed vs. mean predicted SPUE in 10x10 cell bins [-9x9 
km]). Binning was necessary because cross-validation data points did not exactly coincide across seasons. 

E. Distribution of mean relative uncertainty - Map showing relative uncertainty in the Stage Ixll predictive model. 
Uncertainty is a function of distance from survey locations, spatial autocorrelation and accuracy of the trend 
regressions. Relative uncertainty values closer to represent predictions that are expected, on average, to have 
lower error; values closer to 1 represent predictions that are expected, on average, to have greater error. 

F. Model structure table, showing percentages of variance attributable to each component of the model and the length 
scales of spatial autocorrelation, for each season and averaged over all seasons. 

G. Seasonal relative uncertainty maps. See Appendix 6.A., Section 6.A.11. 

H. Cross-validation relative uncertainty calibration plot for annual SPUE (using mean predicted abundance and the 
mean value of the annual Stage Ixll relative uncertainty for each 10x10 cell [-9x9 km] bin). See Appendix 6.A., 
Section 6.A.12. for details. 

I. Annual integrated predicted probability of presence. Probability of species being present in at least one of the modeled 
seasons. See Appendix 6.A., Section 6.A.13. for details. Seasons modeled for each species are shown in Table 6.6. 

J. Representative photo of species or one species in a species group. 

K. Seasonal predicted probability of presence (Stage I final model prediction). Absence data are indicated by small black 
dots, and presence data are indicated by larger open black circles. 

L. Annual cross-validation ROC plot for presence/absence prediction, using maximum probability in 10x10 cell bins 
[-9x9 km] as the predictor of whether at least one presence was observed in that bin. Red line indicates 1:1, black line 
is ROC curve. Small blue dot indicates the optimal operating point on the ROC curve, used to determine the threshold 
that optimizes the tradeoff between sensitivity and specificity. 

M. Relative uncertainty of annual presence probability prediction. 

N. Seasonal maps of relative uncertainty of presence probability prediction. 

O. Cross-validation relative uncertainty calibration plot for annual integrated presence probability (using maximum 
predicted probability and the mean value of the Stage I relative uncertainty for each 10x10 cell [-9x9 km] bin). Note 
that presence, absence, and overall error rates reflect the tradeoff between sensitivity and specificity; some error 
rates may go up even as relative uncertainty goes down. The relative uncertainty is based on the overall odds of the 
model prediction being correct compared to a null model; see Appendix 6.A. for details. 

6.C.2. A note on error masking 

Throughout all sections of this chapter, white space on maps is used to indicate places where predictions 
either were not made (due to insufficient seabird survey data or insufficient information on environmental 
predictors), or were considered too unreliable to display. For some species and groups, we developed two 
versions of the "error masks" used to hide unreliable predictions: a more conservative mask that is used in the 
main body of this report, and a less conservative mask that is used here in Appendix 6.C. The affected species/ 
groups are: Great Shearwater, Sooty Shearwater, Wilson's Storm-Petrel, Pomarine Jaeger, Northern Fulmar, 
Less Common Storm-Petrels, Less Common Shearwaters, and Jaegers. We use the less conservative masks 
here in Appendix 6.C. in order to show the maximum extent of model predictions, and allow the user to judge 
for themselves whether predictions are well-supported by data (original data points are overlaid on maps and 
uncertainty maps are presented alongside each prediction map). The user is cautioned to closely examine 
the distribution of data and the uncertainty maps, as well as diagnostic plots such as the cross-validation 
uncertainty calibration, to determine the degree of confidence appropriate for any particular prediction. Users 
should be especially careful when drawing inferences about "hotspots" or "coldspots" that are not supported 
by any nearby data points; these result solely from extrapolation from environmental conditions, and should be 
considered only potential hot or cold spots until those areas are actually surveyed. For example, predictions 
made in Long Island Sound for species with no presence data points in the Sound are unreliable. Such 
predictions have been masked out in the maps presented in the main body of this report, but may not always 
be masked out by the less conservative masks in this Appendix or Online Supplement 6.2. 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Alcids, less common 



Stage I x II: Relative Abundance Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 



Coastal Waterfowl 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I: Presence Probability Predictive Model 
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Stage I x II: Relative Abundance Predictive Model 
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Stage I: Presence Probability Predictive Model 
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Appendix 6.D. Hotspot Predictive Model Profiles 



6.D.1. Overview 

This appendix presents detailed model 
profiles for the abundance, species 
richness, and Shannon diversity index 
hotspot analyses described in this chapter. 
The format is similar to Appendix 6.C. 
Annual and seasonal maps are presented, 
along with cross-validation observed vs. 
predicted plots and cross-validation relative 
uncertainty calibration plots. Annual cross- 
validation analyses were conducted in 
20x20 cell (-18x18 km) bins. The larger 
bin size was necessary to include enough 
cross-validation data for all species 
simultaneously. Each figure in this appendix 
is explained in the associated caption; for 
detailed methods, see Appendix 6.A. 
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Figure 6.D.1. Annual hotspot relative uncertainty map. The relative uncertainty value is a dimensionless 
number scaled between and 1, where values closer to indicate greater certainty. The relative uncertain- 
ty value at each location is the same for all hotspot quantities (abundance, richness, and diversity index), 
because it is a function of the underlying trend and spatial model uncertainty for each species/group, but 
the relationship of the relative uncertainty value to actual prediction error varies for each of the quantities 
analyzed. That relationship can be seen in the uncertainty calibration plots for each predicted quantity: 
Figure 6.D.4 (Abundance), Figure 6.D.8 (Richness), and Figure 6.D. 12 (Shannon Diversity Index). 



Seasonal hotspot relative uncertainty maps 
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Figure 6.D.2. Seasonal hotspot relative uncertainty maps for (A)Spring, (B)Summer, (C)Fall, and (D)Winter. Explanation of relative uncertainty values is as in Figure 6.D. 1. 




ABUNDANCE HOTSPOTS 



Annual abundance hotspot map 




^100 



Annual Abundance Hotspots 
Cross-Val. Uncertainty Calibration Plot 



LU 

Q_ 
CO 

CD 
O 
C 
CD 
■D 
C 
=3 
_Q 
CD 




s 

Q_ 

■74 -rj.fi -73- -TJ.fl -T2 -T1.& -T"1 -TO. 3 
LongitjdB 

Figure 6.D.3. Annual predicted seabird relative abundance hotspot map. Shading 
represents the sum of the predicted relative abundance (SPUE) for all modeled 
species and groups over all seasons in which they were modeled (# indiv/km 2 /1 5- 
min). Abundance was treated as zero for all seasons in which a species or group 
was not modeled. Note that this method may overestimate the abundance seen in 
any given 15-minute survey due to unaccounted-for correlations among species 
(a correction factor could be derived from the cross-validation results in Figure 
6.D.5). 
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Figure 6.D.4. Cross-validation: relative uncertainty calibration 
plot for annual abundance hotspot map. The mean absolute 
cross-validation error calculated in 20x20 cell bins is plotted vs. 
the mean relative uncertainty value in the same bins; dashed 
lines show range of values in each bin. Red line is a robust 
linear loess smoothing fit (+/- 1 standard deviation; dashed red 
lines). Although the pattern is noisy mean absolute cross vali- 
dation error, measured in units of SPUE (# indiv./km 2 /1 5-min) 
can be seen to decrease smoothly as relative uncertainty de- 
creases. Note log scale on both axes. 
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Figure 6.D.6. Seasonal predicted relative abundance hotspot maps for (A) Spring, 
(B) Summer, (C)Fall, and (D) Winter. Hotspots were calculated by summing predicted 
abundances of all modeled species and groups in the indicated season. 
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Figure 6.D.5. Cross-validation: observed vs. predicted values 
for the annual abundance hotspot map. Observations not 
included in the model training set were averaged in 20x20 
cell bins and are plotted vs. the mean prediction value in the 
same bins. The 1:1 line (blue line) and a robust linear loess 
smoothing fit (red line) are plotted. Dashed horizontal lines 
show range of predictions in each bin. The x-axis is com- 
pressed compared to the y-axis for two reasons: first, the 
model predicts the average value expected over a long time 
at any given location, and individual observations vary around 
this average; second, many more predictions than observa- 
tions are averaged to create each bin value (many of the 
20x20 cell bins contain only one observation, but they contain 
up to 400 predictions). The loess smoothing fit suggests that 
predicted total abundance correlates well with observed total 
abundance, but that observed total abundance is systemati- 
cally lower than the prediction by a factor of approximately 1.5 
to 2. Note log scale on both axes. 




SPECIES RICHNESS HOTSPOTS 
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Figure 6.D.7. Annual predicted seabird species richness hotspot map. Shading rep- 
resents the median of upper and lower bounds on predicted species richness at 
each location, obtained by summing minimum and maximum number of species that 
could have been present over all species and groups, treating each species or group 
as present if its predicted relative abundance was above a threshold. See Section 
6.8.8.2 for methods. Species and groups that were not modeled in any season did not 
contribute to richness. Note that this estimate of richness overestimates the actual 
number of species that would be seen in any given 15-minute survey both because it 
is unlikely that half of the species in each group will always be present, and because 
of unaccounted-for correlations among species (a correction factor could be derived 
from the cross-validation results in Figure 6.D.9). 
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Annual Species Richness Hotspots 
Cross-Val. Uncertainty Calibration Plot 




Rrtfr^j L^wton4r 



Figure 6.D.8. Cross-validation: relative uncertainty calibration 
plot for annual richness hotspot map. The mean absolute cross- 
validation error calculated in 20x20 cell bins is plotted vs. the 
mean relative uncertainty value in the same bins; dashed lines 
show range of values in each bin. Red line is a robust linear loess 
smoothing fit (+/- 1 standard deviation; dashed red lines). Results 
indicate a very noisy and non-linear relationship between relative 
uncertainty and the actual observed cross-validation error in spe- 
cies richness; relative uncertainty values between approximately 
50% and 60% correspond to the best relationship between pre- 
dicted and observed values at the 20x20 cell bin scale. It is rec- 
ommended that the relative uncertainty of the richness hotspot 
maps be interpreted with caution. Areas where relative uncertain- 
ty <40% appear particularly unreliable. Some of this may be an 
artifact of the large bin size (20x20 cells) required for this cross- 
validation exercise. Note log scale on both axes. 
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Figure 6.D.10. Seasonal predicted species richness hotspot maps for (A) Spring, (B) Summer, (C) 
Fall, and (D)Winter. Richness hotspots were calculated as described in Section 6.8.8.2. Species 
and groups not modeled in a given season did not contribute to richness for that season. 



Figure 6.D.9. Cross-validation: observed vs. predicted values 
for the annual species richness hotspot map. Observed rich- 
ness at locations not included in the model training set were 
averaged in 20x20 cell bins and are plotted vs. the mean 
predicted richness in the same bins. The 1:1 line (blue line) 
and a robust linear loess smoothing fit (red line) are plotted. 
Dashed horizontal lines show range of predictions in each bin. 
The x-axis is compressed compared to the y-axis for the same 
reasons as noted in Figure 6.D.5. The loess smoothing fit sug- 
gests that predicted species richness correlates well with ob- 
served species richness, but that observed richness is system- 
atically lower than the prediction by a factor of approximately 3 
to 4. Note log scale on both axes. 
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Figure 6.D.11. Annual predicted seabird Shannon diversity index. The Shannon 
index incorporates both presence and relative abundance. See section 6.8.8.3 
for methods. Species and groups that were not modeled in any season did not 
contribute to index calculation. Note that this estimate of diversity overestimates 
the actual value of diversity index calculated in any given 15-minute survey for the 
reasons noted in captions for Figures 6.D.3 and Q.D.I (a correction factor could 
be derived from the cross-validation results in Figure 6.D.13). 
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Figure 6.D.12. Cross-validation: relative uncertainty calibration 
plot for annual diversity index map. The mean absolute cross- 
validation error calculated in 20x20 cell bins is plotted vs. the 
mean relative uncertainty value in the same bins; dashed lines 
show range of values in each bin. Red line is a robust linear 
loess smoothing fit (+/- 1 standard deviation; dashed red lines). 
Although the pattern is noisy mean absolute cross validation 
error, measured in units of the Shannon diversity index, can be 
seen to decrease smoothly as relative uncertainty decreases. 
Note that y-axis is linearly scaled and x-axis is log scaled. 
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Figure 6.D.14. Seasonal predicted Shannon diversity index maps for (A) Spring, (B) Summer, 
(C)Fall, and (D) Winter. Diversity index values were calculated as described in Section 6.8.8.2. 
Species and groups not modeled in a given season did not contribute to diversity calculations 
for that season. 
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Figure 6.D.13. Cross-validation: observed vs. predicted 
values for the annual Shannon diversity index map. Observed 
Shannon diversity index at locations not included in the 
model training set was averaged in 20x20 cell bins and is 
plotted vs. the mean predicted Shannon diversity index in 
the same bins. The 1:1 line (blue line) and a robust linear 
loess smoothing fit (red line) are plotted. Dashed horizontal 
lines show range of predictions in each bin. The x-axis is 
compressed compared to the y-axis for the same reasons 
as noted in Figure 6.D.5. The loess smoothing fit suggests 
that the predicted Shannon diversity index correlates well with 
the observed value of the index in 20x20 cell bins, but that 
observed diversity is systematically lower than the prediction 
by a factor of approximately 1.2 to 1.4. Note that both axes 
are linearly scaled. 
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The Center for Coastal Monitoring and Assessment's mission is to assess and forecast coastal and marine ecosystem conditions 
through research and monitoring. CCMA conducts field observations on regional and national scales. The center provides the 
best available scientific information for resource managers and researchers, technical advice, and accessibility to data. For more 
information, visit: http://ccma.nos.noaa.gov/ 




